Redshift and Glue
Data Warehouses are databases which are used as repositories for data used for analysis.
Relational databases are used to store individual values while data warehouses are used to store aggregate values.
AWS redshift supports concurrency with multiple users and multiple queries running against your cluster. It also supports scaling of your cluster on demand. Another feature of Redshift - Redshift spectrum allows you to query non relational data stored in AWS S3. Data is stored in massively parallel columnar index. Redshift cluster consists of leader node and compute nodes. Clients interact with redshift cluster using SQL endpoints located on the leader node. clients send queries to the leader node which converts them into jobs based on query logic and sends them to compute nodes for parallel processing. The compute nodes contain the actual data the queries need. They perform operations and return the results to the leader node. The leader node then aggregates the results from all compute nodes and sends the report back to the client.
Redshift spectrum can run queries against both redshift and S3 and thus saving time and money. You start by selecting the cluster node types. Each cluster node has memory, storage and IO.
Comments
Post a Comment