Some common points:
1. Logging - async logging - traceId which is passed to every layer to enable better debugging and log collation. Hot/cold storage for logs - ssd(recent logs) vs hdd(older logs)
2. For media(videos) - how to handle long connections - LB will maintain state while backend undergoing change - for real time streaming, it's ok - just resume from current time. For static videos, resume from last known location. Use DSR, where LB is bypassed in the reverse direction and data directly goes to client.
3. Analytics - separate storage for archival. EMR jobs. HDFS. Pig scripts.
4. Caching - Distributed vs. each node having the full replica. In the second case, each one just listens for updates which flow from the master. How much RAM does each node have? Prevents a hop but not scalable. (Aerospike vs Redis distributed). Routing in the first case - client frequently syncs with the master for latest routing table updates so as to directly hit the backend vs. touching master for every call.
5. Queuing - Kafka. Producers/consumers/intermediate storage - each layer should be able to independently scale. How to partition for listeners? How much data to store waiting for consumers to consume. Each consumer should be able to go back and forth in the queue irrespective of how much has been processed.
6. LB - DNS(round robin/geo aware) -> L3(BGMP)->L4(TCP)->L7(http/redis). Passthru vs connection termination. DSR? SSL termination? Http2?
7. Search - Lucene/Autocomplete based on tries/
8. Social media content - Taiji - User clique based DC routing. A user is likely to consume content generated by friends. So cache data for a clique in the same DC.
9. Separate out read/write flows - scale indep'ly. 100:1 read:write ratio.
10. Redundancy - region/zone
11. News feed - push/pull/hybrid
12. DB partitioning - distribute tables? foreign keys? joins?
13. Security,permissions,file sharing.
14. Dropbox - files split in chunks - to better manage retries. Exponential backoff. Mobile clients do pull based sync to preserve data/battery. Response queue for each client. Request queue global. Inline Dedupe vs post processing dedupe.
15. Yelp/Uber - quad tree
16. Ticketmaster - transaction isolation levels
17.