If you’re looking for a cloud database service in Amazon Web Services (AWS), which service do you choose? Amazon Redshift or Amazon Relational Database Service (RDS)? The answer may surprise you.
Redshift and RDS are similar database services in AWS that sometimes get mistaken for each other. Each service fulfills a different use case. But don’t think you have to pick one over the other. You are perfectly able to deploy them side-by-side should you need to. You can even integrate database platforms like MySQL or PostgreSQL with both services.
In this article, you’ll learn what each service is, ways to use them, and how they can integrate with other databases.
If you need a relational database in the cloud, look no further than RDS. RDS is a quick and easy way to bring up a database without worrying about the infrastructure.
Since RDS is backed by the AWS cloud, the service is able to scale quickly with a workload. The only limit being a maximum database size of 64TB.
With RDS, you also have the option which database engine to use. RDS provides engines such as Amazon Aurora DB, Oracle, Microsoft SQL, MySQL, PostgreSQL, or MariaDB.
If you’re looking for a cloud equivalent of your on-premises database, RDS is a good option for you.
If RDS is your on-premises database, Redshift is your enterprise data warehouse.
Like RDS, Redshift can scale and scale big; up to the petabyte level. Redshift is able to scale so high by deploying in clusters. Clusters can scale in both capacity and performance by simply adding more nodes.
Additionally, you can optimize your Redshift instance for better performance or more storage.
Redshift originally was a fork of PostgreSQL. If you’re familiar with PostGres, Redshift should be easy for you to pick up.
If traditional database engines aren’t providing the performance or scalability you need, you should think about using Redshift.
Even though both RDS and RedShift are similar, they each provide unique benefits. These benefits can be leveraged in tandem. You can easily deploy a PostgreSQL RDS instance next to your Redshift cluster, for example. Redshift is just compatible enough with PostgreSQL to allow your RDS database to query Redshift, and return the results for processing to RDS.
As covered on the AWS big data blog, an executive dashboard would be a great example of using both services together. By caching frequently-requested data from RedShift, you can create a materialized view. This view can then be queried against Redshift. When the next query comes in, the materialized view takes over.
The query processes within your PostgreSQL RDS instance, bypassing Redshift altogether. Later, you can refresh the materialized view to keep the data from getting stale.
As mentioned before, your RDS instance may run one of many different database engines. Depending on your deployment, there are third-party tools that can synchronize your database with AWS Redshift. This is a push-button method to create a read replica in AWS. These tools create an optimized replica for row-based storage instead of columnar. Redshift makes it easy to perform analytics on your larger data sets. You can even connect other tools like Informatica.
Another -- AWS native -- way to integrate other databases to Redshift is to use the AWS Data Pipeline. Data Pipeline has recently added templates to interact with Redshift. This makes data copies significantly easier. Using this method, you can copy files stored in S3 or an RDS database. In doing so, you can make a read replica of the data for your use case. Data Pipeline helps to avoid having to integrate third-party software into your flow.
Redshift and RDS both have compelling use cases. The key to knowing which way to go is to think about your scale and how you want to use the data you own. You don’t always have to make a choice between RDS and RedShift. Whether you choose RDS or RedShift, both services will work just fine individually or in tandem.
Get our latest blog posts delivered in a weekly email.