Home
Videos uploaded by user “Jeffrey James”
The SUM(CASE WHEN pattern in SQL to compute valued columns
 
05:52
Here we compute columns to get new and repeat revenue using 2 SUM's which aggregate the binary outcome of a case statement. This is a highly useful pattern in SQL and business analytics.
Views: 4776 Jeffrey James
Python Coin Change Problem (Recursive) Explained in Plain English
 
14:01
https://gist.github.com/jrjames83/94ca6767efba484ec350b9f8d992c0ee We write a solution to solve the classic problem of making change given an amount and list of coins or monetary units. I try to outline how the recursive solution makes more sense than any iterative solutions, given that the greedy algorithm that the recursion employs, is identical to the solution any cashier in a store would use time and time again. The crux of understanding this is that if you need to make change for say, $4.33, if you iterate over all possible monetary units from low to high, you grab the highest value that does not exceed the amount to return, deduct that from the amount owed and do the exact same thing again. Working on the return structure for the recursion is the trickiest part. We end up wrapping the return values in a list, then flatten the list using a nice example I found on stack overflow, which is also recursive and relies on a type error being thrown if you try iterating over an object which is not iterable (like a number).
Views: 5085 Jeffrey James
Handling Missing Dates in SQL
 
04:13
Use generate series to build a base to perform a left join against your data with, using generate_series in PostgresSQL
Views: 1549 Jeffrey James
PostgreSQL - Setting Up the Database and Left vs Inner Joins
 
09:32
Let's get set-up~ Here we get our Dvdrental database set-up and review the basics of join. You'll need this database, http://www.postgresqltutorial.com/load-postgresql-sample-database/ and it assumes you have PG installed on your machine. Please also download pgadmin 3 (https://www.pgadmin.org/ don't download version 4 if you want to see what I see). a) download the zip b) http://magma.maths.usyd.edu.au/magma/faq/extract unzip it so you have a dvdrental.tar file c) open pgadmin, create a new database called "dvdrental" d) import the data by using the "restore" command (from the .tar file) e) with a little luck, it should work and follow along with me as we query it.
Views: 4592 Jeffrey James
Postgres - Use Row_Number to Find Customer First Orders
 
04:25
Easy way to use row_number() over (partition.... window function in postgres to find someone's first order, or the first of anything really.
Views: 668 Jeffrey James
Markov Chain Gamblers Ruin Random Walk Using Python 3.6
 
12:19
https://gist.github.com/jrjames83/7f2b5466182b4add94f80dc06f170ee9 A Markov chain has the property that the next state the system achieves is independent of the current and prior states. We take a look at how long we run out of gambling funds during the following scenario: - $1.0 bet on heads (always heads) - $10 starting capital - How many flips until our gaming funds are gone We run this trial 500x using python's standard library random.random() function, then Numpy's. Not surprisingly, Numpy's runs much more quickly and gives a more conservative estimate of the number of turns it will take. My theory is that Numpy's random algorithm is less deterministic, but I cannot be sure.
Views: 2175 Jeffrey James
Image Classification (Keras) For Idiots - Bill Gates vs Jeff Bezos
 
29:15
Repo for the tutorial with code versions, the notebook and the images if you don't like following along all that much. https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier/tree/master Topics Covered in this video: - Using the google images download script to find images of both of the men - Creating our validation set, the importance of validation data in machine learning - A very brief overview of the Keras Sequential Model and a decent boilerplate Convnet to classify the images - Quick overview of the image generates and structuring images in directories so Keras can easily infer the label - Running the model, getting about 75% accuracy - Preview of using VGG16 as a feature extractor. Imagenet weights may have learned features that will give us higher quality attributes about how to detect the differences between bezos and gates (we'll see!)
Views: 2402 Jeffrey James
InstaPy for Instagram Crontab Setup Issues Mac OSX
 
03:59
Quick walk through of common set-up errors for InstaPy, at least in the context of getting it running from a cronjob on osx. If you get these messages, this may help you fix by editing the absolute path of chromedriver as well as the sqlitedb instance that comes along with InstaPy NameError: name 'session' is not defined Exception AttributeError: "'Service' object has no attribute 'process'" in bound method Service.__del__ of selenium.webdriver.chrome.service.Service object at 0x1010a5d90ignored conn = sqlite3.connect('./db/instapy.db') sqlite3.OperationalError: unable to open database file Remember to check your mail to verify it all works out after running the cron by "cat var/mail/username". Check to see if a process is running by calling "ps wx | grep python". Run code every 2 hours: (assuming this is the right path to your quickstart or whatever file). 0 */2 * * * python /Users/username/InstaPy/quickstart.py
Views: 1441 Jeffrey James
Illustrating the Central Limit Theorem Using Python and Numpy
 
08:02
https://gist.github.com/jrjames83/e37bba221e4de5689aa8dc6f82012548 Using numpy we generate a population distribution with non-normal characteristics (gamma family). Then by way of the random module, we take a series of samples from that distribution, computing their average each time, then plot the distribution of the averages. The result is that the distribution of the averages is normally distributed. We then observe the mean of the normally distributed averages, is the same as the mean of the Gamma population distribution. The upshot is that you can leverage known traits of the normal distribution now to make observations about the parent distribution.
Views: 868 Jeffrey James
Intro To Date Parsing and Time Series Data in SQL
 
07:05
Learn about extract and to_char to extract key features from dates or timestamps in your relational database .
Views: 1281 Jeffrey James
Using postgres regexp_replace to clean text data
 
07:12
Sometimes you need to remove characters or clean data before you extract it. Regexp_replace is a very useful function. We cover it in some detail including some details about word boundaries along with flags.
Views: 1459 Jeffrey James
Array Operations in Postgres - Grouping to An Array
 
07:20
Here we take the values from a group by and add them to an array, while covering basic array operations such as summing or slicing.
Views: 1447 Jeffrey James
Using python to compute distance between points from the gps data
 
08:54
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 We still don't have a notion of cumulative distance yet. So we find a python package https://pypi.python.org/pypi/geopy which does some math to convert lat/lon points into distance. We explore a more obscure zip statement to generate adjacent pairs of a list, which we then compute distance between via a nice looking list comprehension, ultimately summing the distances to the actual measured distance of the incline.
Views: 3214 Jeffrey James
Pandas Garmin Intro - Parsing the GPS Data using BeautifulSoup and Python
 
09:34
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 Using a slightly different technique, we use a zip statement to pair up the elevations and timestamps from the beautiful soup object we parsed earlier.
Views: 2107 Jeffrey James
SQL Row Number / Window Function Example - Top NY Baby Names by Year
 
10:58
Using BigQuery's free tier (1TB per month) we explore baby name data and incorporate the basics of row_number,() where clauses and sub querying to filter inner queries. This was "live" so you see a few errors and my thought process. https://cloud.google.com/bigquery/pricing. A great way to learn SQL (applicable to postgressql, oracle, mssql server and others).
Views: 455 Jeffrey James
Python Knapsack 01 Problem - Dynamic Programming Part 2
 
19:26
Here we code the dynamic programming solution to the knapsack problem using python https://gist.github.com/jrjames83/5aeabcdbe30e3b7d6a069113e2e7190c original spreadsheet of the hand coded algorithm https://docs.google.com/spreadsheets/d/1y8mG0B4bg2K_aZ9U67T9eQno7z8oznxumUFUIKo4pXI/edit#gid=0
Views: 2084 Jeffrey James
Postgres - Top Movies by Genre, Window Functions and CTE's
 
12:12
A nice live coding on our movie rental database, finding top movies by genre using window functions and multiple common table expressions.
Views: 1578 Jeffrey James
SQL - Get Day of Week With extract() + CASE statement
 
09:15
Good introduction to date operations with PostgresSQL and using a case statement to solidify our day of week analysis (Mon, tues, etc...)
Views: 1961 Jeffrey James
Using the google maps API to compute the distance between points - and parse some JSON
 
09:11
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 Here we look at an example of an API along with a python package called "googlemaps" https://github.com/googlemaps/google-maps-services-python https://developers.google.com/maps/documentation/distance-matrix/ (get a key here) if you use mine it won't work ;) we make a request using 2 lat/lon points from our dataframe containing the gps data and compute the distance, finally parsing the nested JSON response in real time.
Views: 5673 Jeffrey James
Python 3.6 Roman Numerals Kata: Recursive Solution
 
09:58
Using python 3.6 dictionaries which retain their original orderings to solve the Roman Numeral Kata: https://gist.github.com/jrjames83/a382bfea37fd7aadac2a4f6245e7441b The key is for each iteration, bracketing the input integer against the symbol integer values, figuring out how much is leftover and recursively calling the function against the balance remaining.
Views: 1095 Jeffrey James
Abstract Intro the ROW_NUMBER Window Function Using Random Data
 
06:27
A very slow and precise walk through of using row number and partition by in a window function. Try it out!
Views: 184 Jeffrey James
Keras Tutorial (Moving onto Border Collie vs Yellow Labrador )
 
12:31
https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier After lots of manual cleaning of the Bezos vs Gates dataset and repeated failed attempts to beat the scratch convnet accuracy of 80%, through retraining the top layer of VGG as well as fine tuning VGG, we move onto a more tractable problem. We get about 600 border collie images and 600 yellow lab images. - Our standard convnet architecture gets 90% validation accuracy (good) - we briefly review the python code to split our data into train and validation sets, being careful to shuffle our data to avoid potentially introducing bias I'm now on a windows machine using a gtx1080ti card, so the data processing is around 20x faster than on my i5 chip on the imac. In the next video we'll focus on predicting the outcomes of some images, before moving onto trying for 95% accuracy using transfer learning and fine tuning. STay "tuned". Jeff
Views: 72 Jeffrey James
A Gentle Intro to the LAG() Window Function in SQL
 
06:34
A Gentle Intro to the LAG() Window Function in SQL, complete with a few mistakes which may help you understand it more readily!
Views: 308 Jeffrey James
SQL - Break out Customer Orders by New vs Repeat
 
06:57
A good approach to break out orders between new and repeat, then use a CASE statement to solidify the distinction and aggregate further based on the results.
Views: 265 Jeffrey James
Keras Gates Bezos 2 - Bottleneck Features 78% Accuracy :/
 
21:42
Repository for the project: https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier In this video we used VGG16 (not using the dense layers) to generate features to train a multilayer perceptron (MLP) which does not use convolutions. We find that we don't do as well as the scratch convnet. I propose theories regarding the differences between gates and bezos, in that their differences visually may not be significant to the imagenet trained network's weights, which are learning to distinguish more broadly (monkey vs airplane, etc...). For example, had one of them worn a western hat all the time, I bet it would have done a better job at discerning the difference. In the next video we'll take a look at getting more data and seeing if our convnet and bottleneck strategy improves. Other topics we'll cover in later videos: - image similarity strategies - are there any pictures of gates IN the bezos folder since they are 2 of the richest men in the world? - fine tuning - using a problem with more than 2 classes, to talk about one hot encoding and softmax classification
Views: 450 Jeffrey James
Python Maximum Subarray Problem in Plain English
 
12:19
https://gist.github.com/jrjames83/c23be13b273c32b74861e09d97dd1d1b Looking at the classic max continuous subarray problem in plain english. Includes detailed thought process along with trial and error using a greedy approach. Also highlights issues with the solution when the array size gets large, or at least goes up by factors of 10, up to 10,000. It took around 48 seconds to run with the 10,000 element numpy array btw. I don't handle the case of all negatives and other edge cases.
Views: 960 Jeffrey James
Keras Image Classification (Border Collie vs Yellow Lab : Image Similarity)
 
18:47
https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier/blob/master/gatesbezos/Border%20Collie%20or%20Yellow%20Lab.ipynb This is a fun one. We pull a layer from our trained convnet which knows how to predict between the dog breeds and use it to extract 96 dimensional vectors for each of our 1000 training photos. We then do some linear algebra to compute a similarity matrix, using cosine distance and I walk through the python required to find for any given input image, which are the most similar images available in our dataset. I also discuss how you could use Imagenet trained models to similar effect, especially if you have tons of images.
Views: 533 Jeffrey James
Postgres split_part, plus CASE stmt tutorial
 
13:56
Just a good deep dive into a very specific query, but if you follow along you'll probably improve your skills a bit.
Views: 583 Jeffrey James
Postgres Generate Series and Inner vs Left Joins
 
07:41
Learn all about generate series and solidify your knowledge of the key join types, inner and left joins (with some discussion of right joins)
Views: 491 Jeffrey James
SQL - Find New Customers by Date of Acquisition
 
10:10
Using date parsing functions we find the number of customers acquired per day with our dvd rental dataset.
Views: 357 Jeffrey James
Simulate ROW_NUMBER() Without Window Functions
 
04:44
Using a correlated subquery, we can create our own ROW_NUMBER function, similar to what you'd find in PG or MSSQL Server or Oracle.
Views: 151 Jeffrey James
Compute Moving Averages Using Postgres
 
04:57
Easily compute moving averages looking back and forward some arbitrary number of rows within the window. You could also swap out avg for a sum to create a rolling sum if you'd like.
Views: 352 Jeffrey James
Use LAG() To find timing between the 1st and 2nd Customer Order
 
11:18
use a combination of row_number plus the lag() function to determine the avg time difference between the customer's first and second orders with our dvd rental database.
Views: 195 Jeffrey James
Python Knapsack 01 Problem - Dynamic Programming Part 1
 
14:54
https://docs.google.com/spreadsheets/d/1y8mG0B4bg2K_aZ9U67T9eQno7z8oznxumUFUIKo4pXI/edit#gid=0 Programming the algorithm using a table and for a limited problem. In the next video we'll actually code the algorithm using python.
Views: 1366 Jeffrey James
Pandas - Handling Timestamps, Getting Prior Rows via Shift
 
09:49
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 Now, a way using dateutil, we parse the unicode strings we got from the .gpx file using beautifulsoup. We also get the time since the prior gpx read using the pandas dataframe shift method and once the dateutil parse function converts the unicode to a datetime object, this is easily done. https://dateutil.readthedocs.io/en/stable/
Views: 888 Jeffrey James
Using a Common Table Expression To Find Customer 1st Order Date
 
04:05
Using a Common Table Expression To Find Customer 1st Order Date using our DVD Rental database. We do not rely on window functions here, but use a common table expression instead. CTE's can be more readable than selecting from a select statement.
Views: 148 Jeffrey James
Using The Collie vs Yellow Lab Model 90% to Explore our Validation Set
 
15:19
https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier/blob/master/gatesbezos/Border%20Collie%20or%20Yellow%20Lab.ipynb With our 90% accuracy using a "from scratch conv net" in Keras, we now use the model.predict_generator method to classify our validation data and inspect performance using confusion matrices along with loading the images in jupyter. I also come up with some code on the fly to find mislabeled image indices and we inspect them to see why the convnet classified them incorrectly. Some important aspects to understand: - generation properties (filenames, classes, class indices) - don't shuffle your validation generator when using it for prediction! Next video we'll go back to transfer learning topics (bottleneck and fine tuning)
Views: 57 Jeffrey James
SQL - Multiple Row_Number functions addressing different partitions
 
04:01
Find the top selling days per quarter as well as month by using 2 window functions (ROW_NUMBER) which address the month and quarter as differing partitions.
Views: 370 Jeffrey James
Simulating The Binomial Theorem in Python, Flipping a Coin
 
13:30
https://gist.github.com/jrjames83/2b922d36e81a9057afe71ea21dba86cb Getting 10 heads or tails in a row should occur 1 out of 1024 times. we know that since the odds of it happening are .50 ^ 10 and if you divide 1 by that value you get 1024. To verify this, we simulate 1024 tosses of a fair coin, then use python itertools groupby and the collections module's Counter class to see how many concurrent heads or tails we generated through our simulations.
Views: 312 Jeffrey James
Keras - Gates vs Bezos Classification #3 - Scripting Data Split (train/valid)
 
27:34
https://github.com/jrjames83/Keras-Gates-vs-Bezos-Image-Classifier I download some more images, then show you how to use python to create a train and validation set. We find some interesting things. - As we expand our validation set, our validation accuracy worsens (prior best of close to 80%, now in the 70's) - In the first run, with a smaller training set, given that I took the first 50 images for validation without shuffling, this could have given me a more favorable validation set, then the approach highlighted in this video. In any event, when you're working with small data, metrics can be a bit "all over" the place and you really need to inspect the output of your work and even look through your training and validation sets to verify they are what you believe they are. In the next video we'll try and get better accuracy through fine tuning a model. So far we've trained one from scratch and used the bottleneck features from VGG16.
Views: 231 Jeffrey James
PostgreSQL - Basic Aggregations, Counting, Where/Having and Text Matching
 
12:08
Here we find the customer spend totals along with the average rental value, then proceed to filter out rows based on aggregates as well as existing columns. We learn about grouping columns, aliasing again and then conclude with a join and some text filtering logic. Final query below. https://gist.github.com/jrjames83/677a00ae7c0863c22b2eafdf36f670f7
Views: 1022 Jeffrey James
Basics of SQL Joins using DVD Rental Database
 
05:30
Straight talk on joins at least in the context of BI or data science.
Views: 1267 Jeffrey James
PostgresSQL Row Number (Hourly sales for each staff member)
 
08:11
Great example using window functions and outer selects on our dvd rental database. Will help you get your head around partitioning the data in a window function and grouping by things.
Views: 207 Jeffrey James
SQL - Do Customers Spend More On 1st or 2nd Order?
 
05:40
Using window functions and some fairly sophisticated SQL, we determine whether buyers spend more on their first or second order.
Views: 115 Jeffrey James
An Alternate Way to Read the Elevation and Time Points
 
08:27
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 in this file we parse the garmin gps file (.gpx) using beautiful soup and convert the data into a pandas dataframe. The original workout is here https://connect.garmin.com/modern/activity/1734079861 Have you done the incline? If so, post your time!
Views: 331 Jeffrey James
Python Sample Size to Generalize Population (Simulation using Random Module)
 
11:25
https://gist.github.com/jrjames83/02ed7352f22dd1da5ab8203b75c9f765 We simulate the process of determining how many random samples we need to take, in order to understand distributional (category) properties of a population far too large to analyze manually. We make good use of the random and collections modules, specifically random.sample and random.choices, along with the Counter class. It seems that around 400 samples from a population of 100k items provides a reasonably good approximation of how many items belong to each category in our overall population.
Views: 257 Jeffrey James
Advent of Code 2017: Day #2 Python List Comprehension Fun
 
05:50
https://gist.github.com/jrjames83/9307439769140165a75d16a1f5c537df Using Python to solve http://adventofcode.com/2017/day/2 in a single line. If you're still getting your head around list comprehensions this might be a worthwhile 5 minutes!
Views: 47 Jeffrey James
SQL For Actors with the Most Films
 
04:10
Using the Dvd Rental DB, we find the actors with the most films. A good refresher on grouping and aggregations using SQL/Postgres
Views: 210 Jeffrey James
Pandas Plotting Basics and Tableau Plots (easier)
 
09:06
Link to data file: https://gist.github.com/jrjames83/4de9d124e5f43a61be9cb2aee09c9e08 How to create a dual axis plot in matploblib for python? Then, using a handy df.to_clipboard() method we throw the data into tableau and quickly build a similar plot.
Views: 678 Jeffrey James
Pandigital Primes Euler 41 Final
 
10:41
Find the largest pandigital number that's also prime. Used itertools, sort of straightforward. Using python 3.6 and Jupyter Notebook https://gist.github.com/jrjames83/7dffbf993dde6208661f145baceb7660 https://projecteuler.net/problem=76
Views: 52 Jeffrey James

Here!
Here!
Here!
How to video chat on messenger
Here!