Tuesday, January 26, 2016

NoSQL notes


  • SQL intro : 
    • Declarative - SQL allows you to ask complex questions without thinking about how the data is laid out on disk, which indices to use to access the data, or what algorithms to use to process the data. A significant architectural component of most relational databases is a query optimizer.
  • Problems with SQL - 
    • Complexity leads to unpredictability. SQL's expressiveness makes it challenging to reason about the cost of each query, and thus the cost of a workload.
    • The relational data model is strict
    • If the data grows past the capacity of one server - partition/denormalize
  • Key-Data Structure Stores - Redis
  • Key-Document Stores - CouchDB/MongoDB/Riak
  • BigTable Column Family Stores - HBase/Cassandra. In this model, a key identifies a row, which contains data stored in one or more Column Families (CFs). Conceptually, one can think of Column Families as storing complex keys of the form (row ID, CF, column, timestamp), mapping to values which are sorted by their keys. This design results in data modeling decisions which push a lot of functionality into the keyspace.
  • HyperGraphDB12 and Neo4J13 are two popular NoSQL storage systems for storing graph-structured data
  • Redis is the notable exception to the no-transaction trend. On a single server, it provides a MULTI command to combine multiple operations atomically and consistently, and a WATCH command to allow isolation.
  • Single server durability - primarily by controlling fsync frequency
  • Multiple server durability - With subtle differences, Riak, Cassandra, and Voldemort allow the user to specify N, the number of machines which should ultimately have a copy of the data, and W<N, the number of machines that should confirm the data has been written before returning control to the user.
  • Sharding/Partitioning means that no one machine has to handle the write workload on the entire dataset, but no one machine can answer queries about the entire dataset.
  • Sharding adds system complexity, and where possible, you should avoid it. Try these : read replicas and caching - Facebook has Memcached installations in the range of tens of terabytes of memory!
  • Consistent Hashing Ring
  • Conflict resolution by vector clocks

No comments:

Blog Archive