Author: Frank Kane

Our courses are led by Frank Kane, a former Amazon and IMDb developer with extensive experience in machine learning and data science. With 26 issued patents and 9 years of experience at the forefront of recommendation systems, Frank brings real-world expertise to his teaching. His ability to explain complex concepts in accessible terms has helped over one million students worldwide gain valuable skills in machine learning, data engineering, and AI development.

How to Choose the Right Database? – MongoDB, Cassandra, MySQL, HBase

Choosing the right database for your application is no easy task.

You have a wide variety of options relational databases such as MySQL, or distributed NoSQL solutions such as MongoDB, Cassandra, and HBase. NoSQL has come to mean not only SQL as many distributed database systems do in fact support SQL-style queries, as long as you are not doing complex join operations and this further blurs the lines between these systems.

We will talk about how to analyze the requirements of your system in terms of consistency, availability, and partition-tolerance, and how to apply the CAP theorem to guide your choice after showing you where different database technologies fall on the sides of the CAP triangle. We will also talk about more practical considerations, such as your budget, need for professional support, and the ease of integration into the other systems already in place in your organization. Maybe you don’t even need a distributed storage solution at all! Choosing the right technology for your data storage will save you a lot of pain as your application grows and evolves and making the wrong choice can lead to all sorts of maintenance problems and wasted work.

Your instructor is Frank Kane of Sundog Education, bringing nine years of experience as a senior engineer and senior manager at Amazon.com and IMDb.com, where his job involved extracting meaning from their massive data sets, and processing that data in a highly distributed manner.

Explore the full course on Udemy

Kafka Tutorial for Beginners

Learn to stream big data with Kafka, starting from scratch.

Kafka is a powerful data streaming technology and a very hot technical skill to have right now. With Kafka, you can publish streams of data from web logs, sensors, or whatever else you can imagine to systems that manipulate, analyze, and store that data all in real time. Kafka bring a reliable publish / subscribe mechanism that is resilient and can allow clients to pick up where they left off in the event of an outage.

In this tutorial, you will set up a free Hortonworks sandbox environment within a virtual Linux machine running right on your own desktop PC, learn about how data streaming and Kafka work, set up Kafka, and use it to publish real web logs on a Kafka topic and receive them in real time. Kafka is sometimes billed as a Hadoop killer due to its power, but really it is an integral piece of the larger Hadoop ecosystem that has emerged.

Explore the full course on Udemy

Build a Serverless App with AWS Lambda – Hands On!

Build a working chat web application from start to finish, using only AWS services! Learn S3, Lambda, Cognito, Cloudfront, API Gateway, IAM, and DynamoDB with a real project.

Data Science, Deep Learning, and Machine Learning with Python

Become a data scientist in the tech industry! Learn how to use Python for a wide variety of data science, machine learning, and data mining application, with hands-on code and real-world examples. My most popular course!

The Ultimate Hands-On Hadoop: Tame Your Big Data!

Learn the larger Hadoop ecosystem and the distributed computing technologies it works with. 15 hours of video content covers over 25 different systems, with hands-on practice and exercises. My biggest course yet!

Learn Elasticsearch 6 and the Elastic Stack

“Elasticsearch 6 and Elastic Stack: In-Depth and Hands-On!” is here! This comprehensive online course covers using Elasticsearch, Logstash, Beats, Kibana, and X-Pack with lots of hands-on examples and exercises, including importing data into Elasticsearch in many different ways. Enroll now to learn these very hot and very valuable skills. (Also available: Elasticsearch 5)

Apache Spark with Scala – Hands On with Big Data!

Learn the hottest technology in wrangling big data on a cluster: Apache Spark! Spark works best with the Scala programming language, so this course will get you up to speed on Scala before diving into everything Spark can do – with lots of hands-on examples and exercises.

Taming Big Data with Apache Spark and Python – Hands On!

Prefer to learn Spark using the more-familiar Python programming language? Get hands-on with the concepts of Apache Spark, and you’ll be computing similar movies using a million movie ratings on a real Hadoop cluster by the end of the course – all just using Python.

Taming Big Data with Spark Streaming and Scala – Hands On!

Learn how to process massive amounts of streaming data in real time on a cluster, using Spark Streaming! Includes a crash course in Scala, and lots of hands-on examples of connecting to various data sources such as Kafka, Flume, TCP ports, Cassandra, and more.

Taming Big Data with MapReduce and Hadoop – Hands On!

Learn the technology that started it all – MapReduce! MapReduce is at the heart of Hadoop, and offers a programming model for processing massive data sets on a cluster in the cloud. Get hands-on with lots of examples using the Python programming language, ranging from simple tasks all the way to anlayzing social networks and making movie recommendations with real data sets.