Week 21: CST 363 - Introduction to Database Systems

WK05: Under the Hood - Storage, Design, and Transactions

This week, the big picture came into focus. Earlier modules dug into how data lives on disk - from storage media and table layouts to single- and multi-level indexes, alternatives like bitmap/hash, and how tablespaces/partitions and physical design shape access paths. In parallel, the database design track kept sharpening the modeling lens: entities/relationships, cardinality, weak vs. strong entities, supertypes/subtypes, and how to implement all that cleanly with attributes and keys - then normalize (1NF → 3NF → BCNF) before selectively denormalizing when the read patterns demand it. Together, those tracks gave me both the conceptual design and the physical storage and indexing for reasoning about queries.

On top of that foundation, this week, we added Transaction Management: what a transaction is, why ACID matters, and how schedules interleave work. I practiced spotting anomalies (dirty, non-repeatable, phantoms) and aligning them with isolation levels - up to Serializable for the strongest guarantees. The concurrency story centered on locks and two-phase locking (including strict 2PL), plus the trade-offs: higher isolation reduces anomalies but can increase contention and even deadlocks, which we detect/resolve (or time out). Recovery rounded out the picture: logging changes so the system can redo committed work and undo uncommitted work after a crash, which makes the durability promise real. Net takeaway: correctness is a policy (isolation), performance is a negotiation (concurrency control), and durability is the safety net (logging + recovery).

If indexes speed up queries, what's a "slow index"?

From the reading: an index lookup isn't just "walk the B-tree and you're done." It has three steps: (1) tree traversal; (2) following the leaf-node chain to gather all matching entries (e.g., many rows share the same key or range); and (3) table access by rowid for each hit. Only the traversal has a tight upper bound (index depth). Steps (2) and (3) can explode in I/O if the predicate has low selectivity (lots of matches) and the corresponding table rows are scattered across many blocks. That's why an INDEX UNIQUE SCAN (at most one match) is predictably fast, while an INDEX RANGE SCAN can read a large chunk of the index and then fan out into many random table fetches. The "slow index" myth blames a "degenerated" tree and prescribes rebuilds; the material makes it clear the real culprit is workload characteristics (selectivity + row scattering), not a broken B-tree. In short: an index can still be "slow" when it helps you find too many rows that are expensive to fetch.

Comments

Popular posts from this blog

Week 4. Class: CST 300 - Major Pro Seminar

Week 2. Class: CST 300 - Major Pro Seminar

Week 5. Class: CST 300 - Major Pro Seminar