Posts

Showing posts from April, 2021

Middleware

 Connects two applications/systems together. REST- representational state transfer SOAP- Simple Object Access protocol

Redshift and Glue

 Data Warehouses are databases which are used as repositories for data used for analysis. Relational databases are used to store individual values while data warehouses are used to store aggregate values. AWS redshift supports concurrency with multiple users and multiple queries running against your cluster. It also supports scaling of your cluster on demand. Another feature of Redshift - Redshift spectrum allows you to query non relational data stored in AWS S3. Data is stored in massively parallel columnar index. Redshift cluster consists of leader node and compute nodes. Clients interact with redshift cluster using SQL endpoints located on the leader node. clients send queries to the leader node which converts them into jobs based on query logic and sends them to compute nodes for parallel processing. The compute nodes contain the actual data the queries need. They perform operations and return the results to the leader node. The leader node then aggregates the results from all ...

Basics of DynamoDB

  DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second. DynamoDB does not support locking of object the way a relation DB typically does. It uses a strategy called optimistic locking. CAP theorem- Consistency, Availability, partition tolerance. Theorem states that at the most we can only have two at a time. Eventual consistency- reads a data that might be stale and not replicated across all the partitions. But guarentees speed and maximum throughput. Strong- makes sure that the data read has been replicated across all partitions. Returns most latest data but throughput might be affected. Partition tolerance- systems capability to maintain functionality, ...