http://www.slideshare.net/SparkSummit/clickstream-analysis-with-sparkunderstanding-visitors-in-realtime-by-josef-adersberger
Video's
https://www.udemy.com/taming-big-data-with-apache-spark-hands-on/learn/v4/t/lecture/3688138
https://www.safaribooksonline.com/library/view/building-spark-applications/9780134393490/part09.html
https://app.pluralsight.com/player?course=apache-spark-fundamentals&author=justin-pihony&name=apache-spark-fundamentals-m1&clip=6&mode=live
http://www.slideshare.net/tumra/clickstream-social-media-analysis-use-cases-and-examples-using-apache-spark
https://www.youtube.com/watch?v=KiZvHk3ChtM
https://www.quora.com/Is-Scala-a-better-choice-than-Python-for-Apache-Spark
https://www.quora.com/How-do-I-learn-Apache-Spark
http://mollyrossow.com/how%20to/2015/08/04/How-to-run-PySpark-on-AWS-EMR%20and-S3/#sec-8
http://www.tutorialspoint.com/spark_sql/spark_sql_dataframes.htm
https://www.youtube.com/watch?v=odcEg515Ne8
spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.1 SimpleApp.py
-- pull entire directory of s3 files and process
http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219
http://bigdatasciencebootcamp.com/posts/Part_3/basic_big_data.html
https://flume.apache.org/
http://kafka.apache.org/
mesos-and-yarn
https://www.oreilly.com/ideas/a-tale-of-two-clusters-mesos-and-yarn
Hive : https://en.wikipedia.org/wiki/Apache_Hive
HBase : https://hbase.apache.org/