Posts

AWS dynamo DB training : Basics

 Each dynamo DB table is a collection of items and each item consists of a group of attributes. One or two attributes are used as primary keys. One attribute is used by dynamo DB for partitioning. We could also define a second attribute as sort key which the dynamo DB uses to sort each partition.  Single table per application Items are similar to rows in a relational DB. There is not limit to the number of items in a table.  Attributes are similar to columns in a relational database. Attributes can be scalar or nested.  When you create a table you must specify a primary key. Primary key can only be a scalar. You can also define an attribute as a sort key and one attribute as a partition key.  Primary key is used as partition key. Items are stored and retrieved using this partition key. Two types of access patterns for table- queries and scans. Queries are based on primary key and read selected items. Scan on the other hand read all the items.  Local seconda...

Middleware

 Connects two applications/systems together. REST- representational state transfer SOAP- Simple Object Access protocol

Redshift and Glue

 Data Warehouses are databases which are used as repositories for data used for analysis. Relational databases are used to store individual values while data warehouses are used to store aggregate values. AWS redshift supports concurrency with multiple users and multiple queries running against your cluster. It also supports scaling of your cluster on demand. Another feature of Redshift - Redshift spectrum allows you to query non relational data stored in AWS S3. Data is stored in massively parallel columnar index. Redshift cluster consists of leader node and compute nodes. Clients interact with redshift cluster using SQL endpoints located on the leader node. clients send queries to the leader node which converts them into jobs based on query logic and sends them to compute nodes for parallel processing. The compute nodes contain the actual data the queries need. They perform operations and return the results to the leader node. The leader node then aggregates the results from all ...

Basics of DynamoDB

  DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second. DynamoDB does not support locking of object the way a relation DB typically does. It uses a strategy called optimistic locking. CAP theorem- Consistency, Availability, partition tolerance. Theorem states that at the most we can only have two at a time. Eventual consistency- reads a data that might be stale and not replicated across all the partitions. But guarentees speed and maximum throughput. Strong- makes sure that the data read has been replicated across all partitions. Returns most latest data but throughput might be affected. Partition tolerance- systems capability to maintain functionality, ...