Home
Videos uploaded by user “MachineLearning with Python”
Python for Machine Learning - Part 23 - Feature Scaling - StandardScaler
 
05:23
Github Link - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part23 Standard Scaler ''' Topics to be covered Feature Scalaing Standard Scalar ''' ''' (Xi - Xmean) / (standard Deviation of that feature) ''' from sklearn import preprocessing import numpy as np x = np.array([[-400], [-100], [0], [100], [400]]) standardscaler = preprocessing.StandardScaler() x_scaler = standardscaler.fit_transform(x) print(x_scaler) #----------------------------------- x1 = np.array([[1,2,3], [4,5,6], [7,8,9]]) standardscaler1 = preprocessing.StandardScaler() x_scaler1 = standardscaler1.fit_transform(x1) #------------------------------------- import pandas as pd dataset = pd.read_csv('Age-Salary.csv') features = dataset.iloc[:,[2,3]].values standardscaler2 = preprocessing.StandardScaler() x_scaler2 = standardscaler2.fit_transform(features) print(x_scaler2) ###################### Applying it to 3X3 Matrix x1 = np.array([[1,2,3], [4,5,6], [7,8,9]]) standardscaler1 = preprocessing.StandardScaler().fit(x1) x_scaler1 = standardscaler1.transform(x1) print(x_scaler1) ################################## Appying ti to a Pandas Dataset import pandas as pd dataset = pd.read_csv('Age-Salary.csv') features = dataset.iloc[:,[2,3]].values standardscaler_as = preprocessing.MinMaxScaler().fit(features) features_scale = standardscaler_as.transform(features)
Python for Machine Learning | Install Graphviz | Install Pydotplus to visualize Decision Tree- P88
 
04:11
''' Python for Machine Learnion Session # 88 Topic to be covered - How to install Graphviz, Pydotplus and execute it to generate the Decision Tree Graph in Anaconda/Spyder ''' from sklearn.tree import DecisionTreeClassifier from sklearn import datasets from IPython.display import Image from sklearn import tree import pydotplus # Load data iris = datasets.load_iris() X = iris.data y = iris.target # Create decision tree classifer object #clf = DecisionTreeClassifier(random_state=0) clf_decisiontree = DecisionTreeClassifier(criterion = "entropy",random_state=0) #clf = DecisionTreeClassifier(criterion = "gini",random_state=0) # Train model model = clf_decisiontree.fit(X, y) # Create DOT data dot_data = tree.export_graphviz(clf_decisiontree, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names) # Draw graph graph = pydotplus.graph_from_dot_data(dot_data) # Show graph Image(graph.create_png()) # Create PDF graph.write_pdf("iris2.pdf") # Create PNG graph.write_png("iris2.png")
Python for Machine Learning - Part 26 - Detect and Handle Outliers
 
07:16
Outliers - How to detect the outliers and reduce the effect using variable transformation like using log, square root, cube root or other suitable method.
Python for Machine Learning - Part 28 - Get Dummies to transform Categorical Variables into Boolean
 
04:40
'' Topic to be covered - Get Dummies ''' import pandas as pd data = {'firstname': ['Arun', 'Jebu', 'Venkat', 'Rekha', 'Majid','Mohsin'], 'lastname': ['Kumar', 'Jacob', 'Raghavan', 'Singh', 'Khan','Khan'], 'employmenttype': ['Service', 'Business', 'Student', 'Service', 'Business','Business'], 'country' :['India','USA','USA','Sweden','Australia','Germany']} df = pd.DataFrame(data, columns = ['firstname','lastname','employmenttype','country']) df1 = pd.get_dummies(df['employmenttype']) df2 = pd.get_dummies(df['country']) frames = [df,df1,df2] result = pd.concat(frames,axis=1) dataset = pd.read_csv('Datapreprocessing.csv') dataset1 = pd.get_dummies(dataset['Country']) frames1 = [dataset,dataset1] result1 = pd.concat(frames1,axis=1)
Python for Machine Learning - Part 45 - Random State in Train Test Split
 
08:24
''' Topic to be Covered - Importance of Random State in Train Test Split ''' import pandas as pd import numpy as np df = pd.read_csv('Datapreprocessing.csv') # Get the rows that contains NULL (NaN) df.isnull().sum() # Fill the NaN values for Occupation, Emplyment Status and Employement Type col = ['Occupation','Employment Status','Employement Type'] df[col] = df[col].fillna(df.mode().iloc[0]) df['Age'].fillna(df['Age'].mean(),inplace=True) df['Salary'].fillna(df['Salary'].mean(),inplace=True) # #col1 = ['Age','Salary'] #df[col1] = df[col1].fillna(df.mean) features = df.iloc[:,:-1].values labels = df.iloc[:,-1].values # #------------------------------- L A B E L E N C O D I N G ------------------# from sklearn.preprocessing import LabelEncoder encode = LabelEncoder() features[:,0] = encode.fit_transform(features[:,0]) features[:,2] = encode.fit_transform(features[:,2]) features[:,3] = encode.fit_transform(features[:,3]) features[:,4] = encode.fit_transform(features[:,4]) features[:,5] = encode.fit_transform(features[:,5]) ############### S A M P L I N G from sklearn.cross_validation import train_test_split X_train2, X_test2, y_train2, y_test2 = train_test_split(features, labels, test_size=.25, random_state=None)
Python for Machine Learning - Part 15 - Handling Missing Values Using Imputer
 
15:48
Github link for .csv file - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part15 Topics to be covered : 1. First approach will be remove the records that contains the missing values. 2. Second approach is to use Imputer 3. Third approach is to use groupby and fill the missing values Second Approach using IMPUTER features = df.iloc[:,:-1].values labels = df.iloc[:,-1].values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN',strategy='mean',axis=0) # 2 step transformation # Fit and Tranform imputer.fit(features[:,[1,6]]) features[:,[1,6]] = imputer.fit_transform(features[:,[1,6]]) df1 = pd.DataFrame(features) #Fill the missing values for non-numeric columns cols = ['Occupation','Employment Status','Employement Type'] df[cols] = df[cols].fillna(df.mode().iloc[0])
Python for Machine Learning - Part 27 - Detecting Outliers using Elliptic Envelope
 
07:17
Detecting Outliers using Elliptic Envelope ' Code Begins Here ''' Topic to be covered - Detect Outliers using Elliptic Envelopes ''' import pandas as pd import numpy as np from sklearn.covariance import EllipticEnvelope dataset = pd.read_csv('Salary.csv') X = np.array([[100,100], [1,1], [2,4], [5,6], [6,8]]) outlier = EllipticEnvelope(contamination=0.1) outlier.fit(X) prediction1 = outlier.predict(X) print(prediction1) #--------------------------------------# features = dataset.iloc[:,[1,2]].values outlier1 = EllipticEnvelope(contamination=0.1) outlier1.fit(features) prediction2 = outlier1.predict(features) print(prediction2) dataset['outliers'] = prediction2
Python for Machine Learning | Preprocessing | fit, transform and fit_transform - P83
 
12:19
""" Python for Machine Learning - Session # 83 Topic to be covered - How fit(), transform() and fit_transform() works ? OR Difference between fit(), transform() and fit_transform() """ import pandas as pd from sklearn.preprocessing import Imputer, LabelEncoder df = pd.read_csv('Datapreprocessing.csv') imputer = Imputer(missing_values='NaN',strategy='mean',axis=0) imputer.fit(df[['Age','Salary']]) X = imputer.transform(df[['Age','Salary']]) imputer.fit_transform(df[['Age','Salary']]) ############################################################################### encode = LabelEncoder() encode.fit(df['Country']) encode.transform(df['Country']) encode.fit_transform(df['Country']) ############################################################################### import numpy as np from sklearn.preprocessing import StandardScaler x1 = np.array([[1,2,3], [4,5,6], [7,8,9]]) standscaler = StandardScaler() x_scaler = standscaler.fit_transform(x1) print(x_scaler) ''' (Xi - Xmean) / (standard Deviation of that feature) ''' standscaler.fit(x1) standscaler.transform(x1)
Python for Machine Learning - Part 22 - Feature Scaling - Min Max Scaler
 
09:59
Github Link - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part22 Topics to be covered Feature Scalaing 1. Min Max Scalar 2. Standard Scalar 3. Normalize 4. Binarize ''' from sklearn import preprocessing import numpy as np x = np.array([[-400], [-100], [0], [100], [400]]) minmaxscaler = preprocessing.MinMaxScaler(feature_range=(0,1)) x_scaler = minmaxscaler.fit_transform(x) print(x_scaler) ''' (Xi - Xmin) / (Xmax - Xmin) (-100 -(-400))/(400 -(-400) ) (-100 + 400) / (400 + 400) 300/800 = 3/8''' ###################### Applying it to 3X3 Matrix x1 = np.array([[1,2,3], [4,5,6], [7,8,9]]) minmaxscaler1 = preprocessing.MinMaxScaler(feature_range=(0,1)) x_scaler1 = minmaxscaler1.fit_transform(x1) print(x_scaler1) ################################## Appying ti to a Pandas Dataset import pandas as pd dataset = pd.read_csv('Age-Salary.csv') features = dataset.iloc[:,[2,3]].values minmaxscaler_as = preprocessing.MinMaxScaler(feature_range=(0,2)) features_scale = minmaxscaler_as.fit_transform(features)
Python for Machine Learning - Part 21 - Standard Deviation & Variance
 
03:15
Standard Deviation & Variance import numpy as np y = [0,1,2,3,4,5,6,7,8]
Python for Machine Learning - Part 18 - Sampling - Train Test Split
 
05:53
Topic to be covered : Sampling using Train_Test_Split from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=.25, random_state=0)
Python for Machine Learning | Detect Outliers using Mathematical Formula - P69
 
03:49
""" Session # 69 Topic to be covered - Detect Outlier using the Mathematical Formula Step 1 - Sort the Pandas Column Step 2 - Calculate Lower Quartile, Upper Quartile and Inter Quartile Q1 = Lower Quartile Q3 = Upper Quartile IQR = Inter Quartile Range lower Boundary = Q1 - 1.5 * IQR Upper Boundary = Q3 + 1.5 * IQR """ import pandas as pd import numpy as np df = pd.DataFrame() df['count'] = [5,8,6,2,9,11,15,16,25,40,23] df.sort_values(by='count',inplace=True) df.describe() Q1, Q2, Q3 = np.percentile(df,[25,50,75]) IQR = Q3 - Q1 lower_boundary = Q1 - 1.5 * IQR upper_boundary = Q3 + 1.5 * IQR
Python for Machine Learning - Part 11- Iterate over Column of a Pandas Dataset
 
07:29
Topic to be covered : 1. How to iterate over column of a Pandas Dataframe
Python for Machine Learning - Part 19 - Measures of Central Tendency 1- Mean, Median, Mode
 
05:37
'Measures of Central Tendency - & Measures of Disperson Mean, Median, Mode, Standard Deviation, Variance, Quartiles, Lower Quartile (Q1), Upper Quartile (Q3, Inter-Quartile (Q3 - Q1), Semi-Inter Quartile(Q3 -Q1)/2 import statistics as st import numpy as np x = [0,1,2,3,4,5,6,7,8,9] st.mean(x) st.median(x) st.median_low(x) st.median_high(x) st.mode(x) st.stdev(x) st.variance(x) Q1 = np.percentile(x,np.arange(0,100,25)) Q3 = np.percentile(x,np.arange(0,100,75))
Python for Machine Learning - Part 8 - Groupby with Pandas | Filter rows using Groupby
 
05:24
Topic to be covered: 1. Groupby dataset based on actions like mean, median. mode, sum, max, count etc. import pandas as pd import numpy as np df = pd.read_csv('train.csv') df.groupby('Sex').mean() df.groupby('Sex')['Age'].mean() df.groupby(['Sex','Survived'])['Age'].mean() df.groupby(['Sex','Pclass','Survived'])['Age'].mean()
Python for Machine Learning - Predictions with Simple Linear Regression 5 - Residual sum of squares
 
05:53
''' RSS - Residual sum of squares MSE - Mean Squared Error RMSE - Root Mean Squared Error ''' RSS = ((y_test - y_predtest)**2).sum() MSE = np.mean((y_test - y_predtest)**2) RMSE = np.sqrt(MSE)
Python for Machine Learning - Part 3 - Replace a cell value, Rename a column name of Pandas dataset
 
06:19
Topics to be covered : 1. How to replace a value. 2. How to rename a column. # Refer the same Titanic Train Dataset # How to replace a value? dataset['Sex'].replace("male",'Men') dataset['col1'] = dataset['Sex'].replace("male",'Men') # How to rename a column dataset = dataset.rename(columns={'col1': 'col2'}) dataset = dataset.rename(columns={'col2': 'col3'}) dataset = dataset.rename(columns={'Parch': 'Parch1'})
Python for Machine Learning | Create, Describe Dataframes | Operations with Pandas Dataset - P1
 
17:10
In this Video we cover the beow topic: : 1. How to create a Dataframe 2. How to Describe a Dataframe import pandas as pd import numpy as np dataset = pd.read_csv('train.csv')
Python for Machine Learning - Part 30 - Prediction with Simple Linear Regression 1
 
07:31
Github link for python and .csv file - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part30-31-32-33-34-35 ''' Topic to be covered - Simple Linear Regression Scenario - We have the Years of Experience and the Salary with us. We will train the model using LinearRegression from scikit Learn and predict the Salary. ''' # Step 1 import pandas as pd import numpy as np import matplotlib.pyplot as plt # Step 2 - Load the dataset df_train = pd.read_csv('SalaryData_Train.csv') # Step 3 print(df_train.head()) # Step 4 - Visualisation plt.scatter(df_train['YearsExperience'],df_train['Salary']) plt.xlabel('Years of Experience') plt.ylabel('Salary in ???') plt.title('Salary V/S Years of Experience') plt.show() # Step 5 - Feature Extraction feature = df_train.iloc[:,:-1].values labels = df_train.iloc[:,1].values
Python for Machine Learning - Part 36 - R Square and Adjusted R Square 1
 
12:33
''' RSquare and Adjusted RSquare ''' import pandas as pd df1 = pd.DataFrame() X1 = [10,20,30,40,50] Y1 = [3,4,2,5,6] df1['X'] = X1 df1['Y'] = Y1 Ymean = df1['Y'].mean() df1['Y-Ymean'] = df1['Y'] - Ymean df1['(Y-Ymean)Square'] = (df1['Y'] - Ymean)**2 df1['Ybar'] = 1.9 + 0.07 * df1['X'] df1['Ybar - Ymean'] = df1['Ybar'] - Ymean df1['(Ybar - Ymean)Square'] = (df1['Ybar'] - Ymean)**2 R_Square = df1['(Ybar - Ymean)Square'].sum() / df1['(Y-Ymean)Square'].sum() ''' Adjusted_R_Square = 1 - (1 - R_Square)**2 * (N -1 ) / (N - K -1) N = No of points in the data K = no of independent variables ''' N = 5 K = 1 Adjusted_R_Square = 1 - (1 - R_Square)**2 * (N -1 ) / (N - K -1)
Python for Machine Learning - Part 16 - Label Encoding - Preprocessing
 
08:42
Topic to be covered - Label Encoding import pandas as pd import numpy as np df = pd.read_csv('Datapreprocessing.csv') # Get the rows that contains NULL (NaN) df.isnull().sum() # Fill the NaN values for Occupation, Emplyment Status and Employement Type col = ['Occupation','Employment Status','Employement Type'] df[col] = df[col].fillna(df.mode().iloc[0]) features = df.iloc[:,:-1].values labels = df.iloc[:,-1].values from sklearn.preprocessing import Imputer, OneHotEncoder imputer = Imputer(missing_values='NaN',strategy='mean',axis=0) # 2 step transformation # Fit and Tranform imputer.fit(features[:,[1,6]]) features[:,[1,6]] = imputer.fit_transform(features[:,[1,6]]) #------------------------------- L A B E L E N C O D I I N ------------------# from sklearn.preprocessing import LabelEncoder encode = LabelEncoder() features[:,0] = encode.fit_transform(features[:,0]) features[:,2] = encode.fit_transform(features[:,2]) features[:,3] = encode.fit_transform(features[:,3]) features[:,4] = encode.fit_transform(features[:,4]) features[:,5] = encode.fit_transform(features[:,5])
Python for Machine Learning | Handling Missing values in Time Series Analysis | Interpolate - P70
 
03:21
''' Session # 70 Topic to be covered - Handling the missing values in a Time Series dataset Learn - Interpolate, ffill, bfill ''' import pandas as pd import numpy as np ts_index = pd.date_range('01/01/2018',periods=8,freq='W') df = pd.DataFrame(index=ts_index) df['no_of_cars'] = [100,200,np.nan,np.nan,500,np.nan,np.nan,800] # User Interpolate df.interpolate() # Forward Fill df.ffill() # Backward Fill df.bfill()
Python for Machine Learning - Part 0 - Install Anaconda for Windows
 
03:37
In this video I will show how to install Anaconda in Windows 7 operating System.
Python for Machine Learning - Part 9 - Groupby based on Timestamp
 
06:11
Topic to be covered: Groupby based on Timestamp Code: import pandas as pd import numpy as np #suppose the date range start from 1st Jan, 2017 and there are 200000 entires # with the frequency of every 1 second ts_index = pd.date_range('01/01/2017',periods=200000,freq='60S') # Create the dataframe df = pd.DataFrame(index=ts_index) # We add a column 'No_of_Vehicles' and generate a random number between 1 to 10 df['No_of_Vehicles'] = np.random.randint(1,10,200000) # Groupby on a weekly basis df.resample('W').sum() # Groupby based on bi-weekly basis df.resample('2W').sum() # Groupby on a monthly basis df.resample('M').sum() df.resample('M',label='left').sum()
Python for Machine Learning - Part 14 - Handling Missing Values by dropping them
 
07:35
Github Link for .csv file - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part14 Topics to be covered : 1. First approach will be remove the records that contains the missing values. 2. Second approach is to use Imputer 3. Third approach is to use groupby and fill the missing values Code: import pandas as pd import numpy as np df = pd.read_csv('Datapreprocessing.csv') # Get the rows that contains NULL (NaN) df.isnull().sum() # data without missing values with respect to columns data_without_missing_values_cols = df.dropna(axis=1) # data without missing values with respect to rows data_without_missing_values_rows = df.dropna(axis=0) #IF we want to get the columns in a dataset with the missing value we will use the following approach cols_with_missing_values = [col for col in df.columns if df[col].isnull().any()] # drop the columns that contains the missing values then we can follow the below approach reduced_data = df.drop(cols_with_missing_values, axis=1)
Python for Machine Learning - Part 13 - Merging Dataframes
 
06:32
Topics to be covered: 1. Merge Dataframe import pandas as pd import numpy as np df1 = pd.DataFrame() ID = [1001,1002,1003,1004] Name = ['Virat Kohli','Susan Whistler','Micheal Scofield', 'Sarah Wilson'] Country = ['India','Australia','England','Canada'] df1['ID'] = ID df1['Name'] = Name df1['Country'] = Country df2 = pd.DataFrame() ID = [1003,1004,2003,2004] Salary = [20000,15000,25000,18000] df2['ID'] = ID df2['Salary'] = Salary df3 = pd.merge(df1,df2,on='ID') df4 = pd.merge(df1,df2,on='ID',how='outer') df_left = pd.merge(df1,df2,on='ID',how='left') df_right = pd.merge(df1,df2,on='ID',how='right') df5 = pd.merge(df1,df2,left_on='ID',right_on='ID')
Python for Machine Learning - Part 5 - How to manipulate NAN (Not a number)
 
10:22
In this video I will show how to deal with NAN(not a number) in python Pandas and Series. """ Created on Tue Dec 5 10:31:16 2017 Topics to be covered - How to Handle Missing Values @author: Aly """ import pandas as pd import numpy as np Series1 = pd.Series([7,6.8,'Avengers',np.nan,'Apple']) #isnull() and notnull() # both the above functions return Boolean ( True or False) Series1.isnull().sum() Series1.notnull().sum() ################################### dataset = pd.read_csv('train.csv') dataset['Age'].notnull().sum() dataset[dataset['Age'].notnull()].head() #################################33 #What happens if we try to replace some value with nan? dataset['Sex'] = dataset['Sex'].replace('female',np.nan)
Python for Machine Learning | Multivariate Linear Regression with Solved Examples - P62
 
22:30
""" Github link - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part62 Topic to be Covered - Multivariate Linear Regression @author: aly """ ''' #Step 1 - Import the necessary libraries and the dataset #Step 2 - Plot the Seaborn Pairplot #Step 3 - Plot the Seaborn Heatmap #Step 4 - Extract the Features and Labels #Step 5 - Cross Validation (train_test_split) #Step 6 - Create the Linear Model (LinearRegression) #Step 7 - Interpreting the Coefficient and the Intercept #Step 8 - Predict the output #Step 9 - Predict the Score (% Accuracy) #Step 10- Verification of the Predicted Value #Step 11- Calculate the MSE and RMSE ''' ''' y = m*x + c y = b0*x0 + b1*x1 + b2*x2 + b3*x3 + ... + bn*xn y = b0 + b1*x1 + b2*x2 + b3*x3 + ... + bn*xn Price per week Population of city Monthly income of riders Average parking rates per month Number of weekly riders ''' #Step 1 - Import the necessary libraries and the dataset import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np df = pd.read_csv('taxi.csv') #Step 2 - Plot the Seaborn Pairplot sns.pairplot(df) #Step 3 - Plot the Seaborn Heatmap sns.heatmap(df.corr(),linewidth = 0.2, vmax=1.0, square=True, linecolor='red',annot=True) #Step 4 - Extract the Features and Labels features = df.iloc[:,0:-1].values labels = df.iloc[:,-1].values #Step 5 - Cross Validation (train_test_split) from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(features,labels,test_size=0.3,random_state=0) #Step 6 - Create the Linear Model (LinearRegression) from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train,y_train) #Step 7 - Interpreting the Coefficient and the Intercept y_pred = regressor.predict(X_test) #Step 8 - Interpreting the Coefficient and the Intercept print(regressor.coef_) print(regressor.intercept_) #Step 9 - Predict the Score (% Accuracy) print('Train Score :', regressor.score(X_train,y_train)) print('Test Score:', regressor.score(X_test,y_test)) #Step 10- Verification of the Predicted Value #y = b0 + b1*x1 + b2*x2 + b3*x3 + ... + bn*xn y_output0 = regressor.intercept_ + regressor.coef_[0]*X_test[0][0] + regressor.coef_[1]*X_test[0][1] + regressor.coef_[2]*X_test[0][2] + regressor.coef_[3]*X_test[0][3] y_output1 = regressor.intercept_ + regressor.coef_[0]*X_test[1][0] + regressor.coef_[1]*X_test[1][1] + regressor.coef_[2]*X_test[1][2] + regressor.coef_[3]*X_test[1][3] #Step 11- Calculate the MSE and RMSE from sklearn import metrics print('MSE :', metrics.mean_squared_error(y_test,y_pred)) print('RMSE :', np.sqrt(metrics.mean_squared_error(y_test,y_pred))) ############################################################################### X1 = [[80, 1770000, 6000, 85]] out1 = regressor.predict(X1)
Python for Machine Learning - Part 17 - One Hot Encoding - Preprocessing
 
16:48
In this Video we will worl with One Hot Encoding: import pandas as pd import numpy as np df = pd.read_csv('Datapreprocessing.csv') # Get the rows that contains NULL (NaN) df.isnull().sum() # Fill the NaN values for Occupation, Emplyment Status and Employement Type col = ['Occupation','Employment Status','Employement Type'] df[col] = df[col].fillna(df.mode().iloc[0]) features = df.iloc[:,:-1].values features1 = df.iloc[:,:-1].values labels = df.iloc[:,-1].values from sklearn.preprocessing import Imputer, OneHotEncoder imputer = Imputer(missing_values='NaN',strategy='mean',axis=0) # 2 step transformation # Fit and Tranform imputer.fit(features[:,[1,6]]) features[:,[1,6]] = imputer.fit_transform(features[:,[1,6]]) features1[:,[1,6]] = imputer.fit_transform(features1[:,[1,6]]) #------------------------------- L A B E L E N C O D I N G ------------------# from sklearn.preprocessing import LabelEncoder encode = LabelEncoder() features[:,0] = encode.fit_transform(features[:,0]) features[:,2] = encode.fit_transform(features[:,2]) features[:,3] = encode.fit_transform(features[:,3]) features[:,4] = encode.fit_transform(features[:,4]) features[:,5] = encode.fit_transform(features[:,5]) features1[:,0] = encode.fit_transform(features1[:,0]) features1[:,2] = encode.fit_transform(features1[:,2]) features1[:,3] = encode.fit_transform(features1[:,3]) features1[:,4] = encode.fit_transform(features1[:,4]) features1[:,5] = encode.fit_transform(features1[:,5]) df1 = pd.DataFrame(features) #--------------------------- ONE HOT ENCODING --------------------------------# hotencode = OneHotEncoder(categorical_features=[0]) features = hotencode.fit_transform(features).toarray() hotencode = OneHotEncoder(categorical_features=[7]) features = hotencode.fit_transform(features).toarray() hotencode = OneHotEncoder(categorical_features=[9]) features = hotencode.fit_transform(features).toarray() hotencode = OneHotEncoder(categorical_features=[11]) features = hotencode.fit_transform(features).toarray() hotencode = OneHotEncoder(categorical_features=[13]) features = hotencode.fit_transform(features).toarray() #-- hotencode = OneHotEncoder(categorical_features=[0]) features1 = hotencode.fit_transform(features1).toarray() hotencode = OneHotEncoder(categorical_features=[2]) features1 = hotencode.fit_transform(features1).toarray() hotencode = OneHotEncoder(categorical_features=[3]) features1 = hotencode.fit_transform(features1).toarray() hotencode = OneHotEncoder(categorical_features=[4]) features1 = hotencode.fit_transform(features1).toarray() hotencode = OneHotEncoder(categorical_features=[5]) features1 = hotencode.fit_transform(features1).toarray()
Python for Machine Learning - Part 2 - Navigate Dataframes rows and columns based on Conditions
 
10:43
Topics to be covered : 1. How to Navigate the Dataframe 2. How to select the rows of a pandas Dataframe based on conditions Code: """ @author: aly Topics to be covered : 1. How to create a Dataframe Completed 2. How to Describe a Dataframe Completed 3. How to Navigate the Dataframe Completed 4. How to select the rows of a pandas Dataframe based on conditions - Completed 5. How to replace a value - Completed 6. How to rename a column - Completed """ import pandas as pd import numpy as np dataset = pd.read_csv('train.csv') # How to create a Dataframe df = pd.DataFrame() # Add Columns df['Name'] = ['Steven Smith', 'Virat Kohli'] df['Age'] = [26,25] df['Country'] = ['Aus','Ind'] # How to add the rows at the bottom # Create a new row new_row = pd.Series(['Angelo Mathews', 28, 'Sri'], index =['Name', 'Age', 'Country']) #df.append(new_row, ignore_index=False) df = pd.DataFrame([['AAA',30,'Nepal']], columns=list('ABC')) df = pd.DataFrame([['AAA',30,'Nepal']], columns=list(['Name', 'Age', 'Country'])) # How to Describe a Dataset df.describe() # Head dataset.head() # Tail dataset.tail() # dispaly the first 3 rows a dataset dataset.head(3) #Shape of a dataset dataset.shape # How to Navigate the Dataframe # loc and iloc dataset.iloc[0][4] # Select 5 rows that start from index # 4 dataset.iloc[:5] dataset.iloc[4:2] dataset.iloc[2:5] ####################### # how to set the index of a dataset df = df.set_index(df['Name']) # how to show the row based on the name #df.loc['Virat Kohli'] # How to select the rows of a pandas Dataframe based on conditions dataset[dataset['Pclass'] == 3 ].head(3) # Get the data based on condition on more than 1 columns # How to replace a value? dataset['Sex'].replace("male",'Men') dataset['col1'] = dataset['Sex'].replace("male",'Men') # How to rename a column dataset = dataset.rename(columns={'col1': 'col2'}) dataset = dataset.rename(columns={'col2': 'col3'}) dataset = dataset.rename(columns={'Parch': 'Parch1'})
Python for Machine Learning - Part 42 - Data Visualisation 1 - Plots, Subplots
 
13:12
''' Topic to be Covered - Data Visualisation Link for the Dataset - http://www.randalolson.com/2014/06/14/percentage-of-bachelors-degrees-conferred-to-women-by-major-1970-2012/ ''' import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('percent-bachelors-degrees-women-usa.csv') #---------------------------- No 1 plt.plot(df['Year'],df['Agriculture'],color='red') plt.plot(df['Year'],df['Architecture'],color='blue') plt.plot(df['Year'],df['Art and Performance'],color='green') plt.plot(df['Year'],df['Physical Sciences'],color='red') plt.plot(df['Year'],df['Computer Science'],color='blue') #-------------------------- No 2 ''' 1 2 3 1 x 3 1,3,1 1,3,2 1,3,3''' plt.subplot(1,3,1) plt.plot(df['Year'],df['Architecture'],color='blue') plt.title('Architecture') plt.subplot(1,3,2) plt.plot(df['Year'],df['Computer Science'],color='green') plt.title('Computer Science') plt.subplot(1,3,3) plt.plot(df['Year'],df['Physical Sciences'],color='yellow') plt.title('Physical Sciences') plt.show() #----------------------------- No 3 plt.plot(df['Year'], 100 -df['English'],c='blue',label='Men') plt.plot(df['Year'], df['English'],c='red',label='Women') plt.title('English Enrollment Comparion between the gender') plt.xlabel('Year') plt.ylabel('Enrollment in percentage') #----------------------------- No 4 ''' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 6 x 3''' fig = plt.figure(figsize=(13,5)) ax1 = fig.add_subplot(6,3,1) ax2 = fig.add_subplot(6,3,2) ax3 = fig.add_subplot(6,3,3) ax4 = fig.add_subplot(6,3,4) ax5 = fig.add_subplot(6,3,5) ax6 = fig.add_subplot(6,3,6) ax7 = fig.add_subplot(6,3,7) ax8 = fig.add_subplot(6,3,8) ax9 = fig.add_subplot(6,3,9) ax10 = fig.add_subplot(6,3,10) ax11 = fig.add_subplot(6,3,11) ax12 = fig.add_subplot(6,3,12) ax13 = fig.add_subplot(6,3,13) ax14 = fig.add_subplot(6,3,14) ax15 = fig.add_subplot(6,3,15) ax16 = fig.add_subplot(6,3,16) ax17 = fig.add_subplot(6,3,17) categories = ['Agriculture','Architecture','Art and Performance', 'Biology','Business','Communications and Journalism', 'Computer Science','Education','Engineering', 'English','Foreign Languages','Health Professions', 'Math and Statistics','Physical Sciences','Psychology', 'Public Administration','Social Sciences and History'] ax = [ax1,ax2,ax3,ax4,ax5,ax6,ax7,ax8,ax9,ax10,ax11,ax12,ax13,ax14,ax15,ax16,ax17] for i in range(len(categories)): ax[i].plot(df['Year'],df[categories[i]],c='red',label='Women') ax[i].plot(df['Year'],100-df[categories[i]],c='blue',label='Women') ax[i].set_title(categories[i]) ax[i].set_ylim(0,100) plt.tight_layout() plt.savefig('categories.jpeg') plt.show()
Python for Machine Learning - Polynomial Linear Regression using Scikit Learn - P56
 
08:54
''' Polynomial Linear Regression using Scikit Learn ''' #y = 9450x + 25792 #y = 16.393x2 + 9259.3x + 26215 #y = -122.92x3 + 2099.4x2 - 718.71x + 38863 #y = 4.9243x4 - 236.59x3 + 2979.9x2 - 3314.2x + 41165 #y = 15.006x5 - 430.13x4 + 4409.7x3 - 19368x2 + 43652x + 8315 import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv('SalaryData_Train.csv') features = df.iloc[:,0:1].values labels = df.iloc[:,1:2].values #plt.scatter(df_train['YearsExperience'], df_train['Salary']) plt.scatter(features,labels) plt.xlabel('Years of Experience') plt.ylabel('Salary') plt.title('Salary V/s Years of Experience') plt.show() #Step 6 - Sampling from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.33, random_state=0) # Create the REgression Model from sklearn.linear_model import LinearRegression regressor = LinearRegression() # Create The Polynomial Features from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree=3) x_poly = poly_reg.fit_transform(features) regressor.fit(x_poly,labels) # TEst the model y_pred = regressor.predict(poly_reg.fit_transform(X_test)) # Calculate the Accuracy print('Polynomial Linear Regression Accuracy:',regressor.score(poly_reg.fit_transform(X_test),y_test)) for i in range(1,6): poly_reg = PolynomialFeatures(degree=i) x_poly = poly_reg.fit_transform(features) regressor.fit(x_poly,labels) print('Degree of Equation :', i) print('Coefficient :', regressor.coef_) print('Intercept :', regressor.intercept_) print('Accuracy Score:', regressor.score(poly_reg.fit_transform(X_test),y_test))
Python for Machine Learning - Part 43 - Data Visualisation 2 - xlim, ylim, legend, axis, axes
 
09:17
''' Topics to be covered - 1. axes() 2. xlim(), ylim() 3. Alternative for xlim, ylim 4. legend 5. annotate ''' import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('percent-bachelors-degrees-women-usa.csv') ''' Task No 1 -- axes() ''' plt.axes([0.1,0.06,0.5,0.8]) plt.plot(df['Year'], df['Architecture'], color ='b') plt.axes([0.61,0.06,0.5,0.8]) plt.plot(df['Year'], df['Psychology'], color ='g') plt.axes([1.12,0.06,0.5,0.8]) plt.plot(df['Year'], df['Computer Science'], color ='r') plt.show() ############################################################################## ''' Task No 2 -- use xlim and ylim ''' ''' Task No 3 -- plt.axis''' plt.plot(df['Year'],df['Architecture'],color='g') plt.plot(df['Year'],df['Psychology'],color='r') plt.plot(df['Year'],100 - df['Architecture'],color='y') plt.plot(df['Year'],100 - df['Psychology'],color='b') # Add the axis plt.xlabel('Year') plt.ylabel('Degrees Awarded') # Setting the x and y axis #plt.xlim(1985,2005) #plt.ylim(0,90) plt.axis((1985,2005,0,90)) plt.title('Degress awarded to women and men') plt.show() ############################################################################### ''' legend()''' plt.plot(df['Year'],df['Architecture'],color='g') plt.plot(df['Year'],df['Psychology'],color='r') #plt.plot(df['Year'],100 - df['Architecture'],color='y') #plt.plot(df['Year'],100 - df['Psychology'],color='b') # Add the axis plt.xlabel('Year') plt.ylabel('Degrees Awarded') # Setting the x and y axis #plt.xlim(1985,2005) #plt.ylim(0,90) plt.axis((1985,2005,0,90)) plt.title('Degress awarded to women and men') plt.legend(loc='lower center') plt.show()
Python for Machine Learning - Part 52 - Polynomial Linear Regression with Numpy
 
13:16
Link for Github - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part52 ''' Topics to be covered - Polynomial Regression without sklearn ''' import numpy as np import matplotlib.pyplot as plt x = np.array([0,1,2,3,4,5,6]) y = np.array([0.2,1.1,1.3,1.2,2.1,1.9,2.5]) ''' Degree 1 - y = 0.3321x + 0.475 Degree 2 - y = -0.0179x2 + 0.4393x + 0.3857 Degree 3 - y = 0.0194x3 - 0.1929x2 + 0.8282x + 0.269 Degree 4 - y = -0.0087x4 + 0.124x3 - 0.5799x2 + 1.2688x + 0.2242 Degree 5 - y = 0.0129x5 - 0.2025x4 + 1.1358x3 - 2.7112x2 + 2.7536x + 0.1873 ''' plt.plot(x,y,'o') coefint1 = np.polyfit(x,y,1) coefint2 = np.polyfit(x,y,2) coefint3 = np.polyfit(x,y,3) coefint4 = np.polyfit(x,y,4) coefint5 = np.polyfit(x,y,5) print(coefint1) print(coefint2) print(coefint3) print(coefint4) print(coefint5) plt.plot(x,y,'o') plt.plot(x,np.polyval(coefint1,x),'black') plt.plot(x,np.polyval(coefint2,x),'g') plt.plot(x,np.polyval(coefint3,x),'r') plt.plot(x,np.polyval(coefint4,x),'y') plt.plot(x,np.polyval(coefint5,x),'c') for i in range(1,6): print('Degree is ', i) p = np.polyfit(x,y,i) plt.plot(x,np.polyval(p,x), color='blue') plt.show() ############################################################################### ypred = coefint2[0]*x*x + coefint2[1]*x + coefint2[2] print('Predicted Values:', ypred) print('Actual Values :', y) ############################################################################### ypred = coefint1[0]*x + coefint1[1] print(ypred) print(y) yresidual = y - ypred Sumofresidual = sum(pow(yresidual,2)) SumofTotal = len(y) * np.var(y) Rsquare = 1 - Sumofresidual/SumofTotal print(Rsquare) from scipy.stats import linregress slope, intercept, r_value, p_value, str_err = linregress(x,y) print(pow(r_value,2))
Python for Machine Learning - Part 38 - Categorical, Numerical, Ordinal and Discrete Data
 
07:46
''' Topic to be covered - Continous and Discrete Variables ''' import pandas as pd train = pd.read_csv('train.csv') train.head() ''' categorical -- Sex, Survived, Embarked Continous -- Age, Fare Discrete -- SibSp, Parch Ordinal -- Pclass ''' # get the distribution of the Numerical features train.describe() # get the disctibution of the categorical features train.describe(include=['O']) # get the distribution for both categorical and numerical features train.describe(include='all')
Python for Machine Learning | Binning with Python | Transforming Numerical to Categorical- P77
 
06:30
''' Session # 77 Topic to be covered - Binning with Python Github Link - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part77 ''' import pandas as pd df = pd.read_csv('Salary.csv') bins = [0,10000,25000,50000,100000] labels = ['low', 'medium', 'standard', 'high'] df['ApplicantIncome_bin'] = pd.cut(df['ApplicantIncome'],bins,labels) print(pd.value_counts(df['ApplicantIncome_bin'],sort=False)) df['categories'] = pd.cut(df['ApplicantIncome'],bins,labels=labels) df['categories'].value_counts() df['categories'].value_counts().plot(kind='barh')
Python for Machine Learning - Part 39 - Pandas Pivot Table and Groupby Functionality
 
06:56
''' Topic to be covered - Pivot Table ''' import pandas as pd train = pd.read_csv('train.csv') train.groupby('Sex')[['Survived']].mean() train.groupby(['Sex','Pclass'])['Survived'].aggregate('mean') # Using Pivot Table train.pivot_table('Survived',index='Sex', columns='Pclass') age = pd.cut(train['Age'],[0,20,30,80]) train.pivot_table('Survived',['Sex',age],'Pclass') fare = pd.qcut(train['Fare'],3) train.pivot_table('Survived',['Sex',age],[fare,'Pclass'])
Python for Machine Learning - Part 6 - Drop Rows and Columns of a Pandas Dataset1
 
07:28
Topics to be covered: 1. Dropping a Single Column of a dataset. 2. Dropping multiple Column of a dataset. 3. Dropping the Column of a dataset based on the Column Index. 4. Dropping the Multiple Column of a dataset based on the Column Index. import pandas as pd import numpy as np dataset = pd.read_csv('train.csv') # 1 Dropping a Single Column of a dataset based on the column name df1 = dataset.drop('Name',axis=1) print(df1.head()) # 2.Dropping multiple Column of a dataset based on the column name df1 = df1.drop(['Pclass','Cabin'],axis=1) print(df1.head()) # 3. Dropping the Column of a dataset based on the Column Index df1 = df1.drop(df1.columns[1],axis=1) print(df1.head()) # 4. Dropping the Multiple Column of a dataset based on the Column Index df2 = df1.drop(df1.columns[[1,3]],axis=1) print(df2.head()) # 5. Deleting rows of a dataset and copying the resultant dataset to a new dataframe df3 = df2.drop(df2.index[[1,3]]) print(df3.head()) # 6. Drop the Duplicate Values based on some conditions df3 = df3[df3['Embarked'] != 'S'].head() print(df3.head()) Dup = df3.drop_duplicates().head() Dup = Dup.drop_duplicates(subset=['Embarked']) print(Dup) Dup1 = df3.drop_duplicates(subset=['Embarked'], keep='last') print(Dup1)
Python for Machine Learning | Preprocessing | Stratify Parameter in train_test_split - P85
 
07:50
''' Python for Machine Learning - Session # 85 Topic to be covered - Stratify Parameter in train_test_split Stratify parameter ensures that the proportion of values in the train, test split (samples) produced will be same as the proportion in the input dataset. ''' import pandas as pd from sklearn import cross_validation, datasets df_iris = datasets.load_iris() features = df_iris.data[:,:2] labels = df_iris.target df_labels = pd.DataFrame(labels) print(df_labels[0].value_counts()) X_train, X_test, y_train, y_test = cross_validation.train_test_split(features,labels,test_size=0.2) df_ytrain = pd.DataFrame(y_train) df_ytest = pd.DataFrame(y_test) print('################# Before Stratify Parameter is used ') print(df_ytrain[0].value_counts()) print(df_ytest[0].value_counts()) X_train, X_test, y_train, y_test = cross_validation.train_test_split(features,labels,test_size=0.2,stratify=labels) df_ytrain = pd.DataFrame(y_train) df_ytest = pd.DataFrame(y_test) print('################# After Stratify Parameter is used ') print(df_ytrain[0].value_counts()) print(df_ytest[0].value_counts())
Python for Machine Learning - Part 20 - Measures of Central Tendency 2 -  Mean, Median, Mode
 
08:31
'''Measures of Central Tendency & Measures of Disperson Mean, Median, Mode, Standard Deviation, Variance, Quartiles, Lower Quartile (Q1), Upper Quartile (Q3, Inter-Quartile (Q3 - Q1), Semi-Inter Quartile(Q3 -Q1)/2 ''' from scipy import stats import numpy as np matrix = np.array([[1,2,4], [3,5,2], [2,1,9]]) np.mean(matrix) np.median(matrix) 1 2 4 3 5 2 2 1 9 1 1 2 2 2 3 4 5 9 # Sorted numbers 1 2 3 4 5 6 7 8 9 np.std(matrix) np.var(matrix) #Q1 or Lower Quartiles np.percentile(matrix,25) #2 #Q3 np.percentile(matrix,75) #4 np.percentile(matrix,25,axis=0) np.percentile(matrix,25,axis=1) #-------------------------------# # M O D E matrix = np.array([[1,2,4], [3,1,2], [3,1,9]]) stats.mode(matrix) stats.mode(matrix,axis=0) stats.mode(matrix,axis=1) stats.mode(matrix,axis=None)
Python for Machine Learning - Part 29 - Probability Mass Function
 
05:24
''' Topic to be covered - Probability Mass Function (pmf) ''' import pandas as pd import matplotlib.pyplot as plt list1 = [1,1,6,3,2,3,4,2,3,5,5,6,5,5,5,5,5,2,2,3,3,5,2,2,5,6,2,2,2,3,6,6,2,4,3,2,3] df = pd.DataFrame(list1) df1 = df[0].value_counts() sum1 = len(df) df2 = pd.DataFrame(df1) df2 = df2.rename(columns={0:'item_counts'}) df2['item'] = df2.index df2['probability'] = df2['item_counts']/sum1 plt.bar(df2['item'],df2['probability'],color='r') plt.show()
Python for Machine Learning - Part 4 - Builtin functions - Sum, Max, Min, Count, Unique etc
 
12:06
Topics to be covered : 1. How to find the Sum, Count, Max, Min and Average. 2. How to find the unique values and its count. import pandas as pd import numpy as np dataset = pd.read_csv('train.csv') # Get the Maximum value of the column Age print('Maximum value of column Age: ', dataset['Age'].max()) # Get the Minimum value of the column Age print('Minimum value of column Age: ', dataset['Age'].min()) # Find the sum of the rows in the columns Age print('Sum of the rows in the column Age:', dataset['Age'].sum()) # Get the count of the number of rows in column Age print('Count of the no of vlaues in the column Age :', dataset['Age'].count()) # Get the Average/mean of the column Age print('The Average age is :', dataset['Age'].mean()) #NaN - No a Number # How to find the unique values and its count dataset['Age'].count() dataset['Sex'].count() dataset['Sex'].unique() dataset['Sex'].value_counts()
Python for Machine Learning | Binning based on Mathematical Formula - P78
 
03:33
''' Session # 78 Topic to be covered - Binning with Python based on a Mathematical Formula Github Link - https://github.com/technologycult/PythonForMachineLearning/tree/master/Part78 ''' import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('Salary.csv') plt.hist(df['ApplicantIncome']) plt.show() #################################### no_of_bins = int(np.sqrt(len(df))) plt.hist(df['ApplicantIncome'],no_of_bins) plt.show()
Python for Machine Learning | Preprocessing | Feature Scaling | RobustScaler - 79
 
10:57
""" Session # 79 Topic to be Covered - Robust Scaler """ ''' Scale features using statistics that are robust to outliers. Formual for Robust Scaler : (Xi - Q2) / (Q3 - Q1) This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature (or each sample, depending on the axis argument) by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the transform method. ''' import pandas as pd import numpy as np from sklearn import preprocessing x = np.array([[-500], [-100], [0], [100], [900]]) robust = preprocessing.RobustScaler() x_robust = robust.fit_transform(x) print(x_robust) Q1 = np.percentile(x,25) Q2 = np.percentile(x,50) Q3 = np.percentile(x,75) print('Lower Quartile :', Q1) print('Median :', Q2) print('Upper Quartile :', Q3) robust_sclaer1 = (x - Q2) / (Q3 - Q1) print(robust_sclaer1) ############################################################################### x1 = np.array([[2,2,3], [4,5,6], [7,8,9]]) robust_scaler2 = robust.fit_transform(x1) print(robust_scaler2) ''' Q1 = (2+4)/2 = 3 Q2 = 4 Q3 = (7+4)/2 = 5.5 ''' ''' Q1 = 7/2 = 3.5 Q2 = 5 Q3 = 13/2 = 6.5 ''' ############################################################################### import pandas as pd df = pd.read_csv('Age-Salary.csv') features = df.iloc[:,[3]].values df['features_sclaed'] = robust.fit_transform(features) Q1 = np.percentile(features,25) Q2 = np.percentile(features,50) Q3 = np.percentile(features,75) df['Robust'] = (features - Q2) / (Q3 - Q1) print(df.head())
Python for Machiine Learning - Part 31 - Predictions with Simple Linear Regression 2
 
08:03
''' Topic to be covered - Simple Linear Regression Scenario - We have the Years of Experience and the Salary with us. We will train the model using LinearRegression from scikit Learn and predict the Salary. ''' # Step 1 import pandas as pd import numpy as np import matplotlib.pyplot as plt # Step 2 - Load the dataset df_train = pd.read_csv('SalaryData_Train.csv') # Step 3 print(df_train.head()) # Step 4 - Visualisation plt.scatter(df_train['YearsExperience'],df_train['Salary']) plt.xlabel('Years of Experience') plt.ylabel('Salary in ???') plt.title('Salary V/S Years of Experience') plt.show() # Step 5 - Feature Extraction feature = df_train.iloc[:,:-1].values labels = df_train.iloc[:,1].values print('Shape of Feature:',feature.shape) print('Shape of Labels:', labels.shape) # Step 6 - Sample from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(feature, labels, test_size = 0.3, random_state=0) # Step 7 - Create the Linear Regression Model from sklearn.linear_model import LinearRegression reg = LinearRegression() # step 8 - fit the regression (reg) model that we have prepared with train and # test dataset in the sampling step reg.fit(X_train,y_train) plt.scatter(X_train,y_train,color='r') plt.plot(X_train,reg.predict(X_train),color='b') plt.show()
Python for Machine Learning | Preprocessing | How to Rescale the data using inverse_transform() -P80
 
05:42
''' Python for Machine Learning - Session # 80 Topic to be covered - How to scale the data back to the oroginal form using inverse_transform() inverse_transform() functionality to be applied on all the scaling algorithms we have covered 1. Standard Scaler 2. MinMax Scaler 3. MaxAbs Scaler 4. Robust Scaler '''
Python for Machine Learning | Data Transformation | Visualisation with MinMaxScaler - P66
 
09:41
""" Topic to be covered - Standard Scaling - Part 22 - Feature Scaling - MinMax Scaler Topic to be covered - How the data is transformed after applying MinMax Scaler Formula for the transformation - (Xi - min(x)) / (max(x) - min(x)) """ import pandas as pd import numpy as np from sklearn import preprocessing import matplotlib.pyplot as plt import seaborn as sns from mpl_toolkits.mplot3d import Axes3D df = pd.read_csv('data_minmaxscaler.csv') scaler = preprocessing.MinMaxScaler() scaled_df = scaler.fit_transform(df) scaled_df = pd.DataFrame(scaled_df,columns=['x1','x2','x3']) fig, (ob1, ob2) = plt.subplots(ncols=2,figsize=(5,6)) ob1.set_title('Before Scaling') sns.kdeplot(df['x1'],ax=ob1) sns.kdeplot(df['x2'],ax=ob1) sns.kdeplot(df['x3'],ax=ob1) ob2.set_title('After Min-Max Scaling') sns.kdeplot(scaled_df['x1'],ax=ob2) sns.kdeplot(scaled_df['x2'],ax=ob2) sns.kdeplot(scaled_df['x3'],ax=ob2) plt.show() plt.scatter(df['x1'],df['x2'],color = 'g') plt.scatter(scaled_df['x1'],scaled_df['x2'],color = 'r') fig = plt.figure(figsize=(8,6)) ob3 = fig.add_subplot(121,projection='3d') ob4 = fig.add_subplot(122,projection='3d') ob3.scatter(df['x1'],df['x2'],df['x3'],color='red') ob4.scatter(scaled_df['x1'],scaled_df['x2'],scaled_df['x3'],color='blue') plt.show()
Python for Machiine Learning - Part 32 - Predictions with Simple Linear Regression 3
 
04:39
''' Topic to be covered - Simple Linear Regression Scenario - We have the Years of Experience and the Salary with us. We will train the model using LinearRegression from scikit Learn and predict the Salary. ''' # Step 1 import pandas as pd import numpy as np import matplotlib.pyplot as plt # Step 2 - Load the dataset df_train = pd.read_csv('SalaryData_Train.csv') # Step 3 print(df_train.head()) # Step 4 - Visualisation plt.scatter(df_train['YearsExperience'],df_train['Salary']) plt.xlabel('Years of Experience') plt.ylabel('Salary in ???') plt.title('Salary V/S Years of Experience') plt.show() # Step 5 - Feature Extraction feature = df_train.iloc[:,:-1].values labels = df_train.iloc[:,1].values print('Shape of Feature:',feature.shape) print('Shape of Labels:', labels.shape) # Step 6 - Sample from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(feature, labels, test_size = 0.3, random_state=0) # Step 7 - Create the Linear Regression Model from sklearn.linear_model import LinearRegression reg = LinearRegression() # step 8 - fit the regression (reg) model that we have prepared with train and # test dataset in the sampling step reg.fit(X_train,y_train) plt.scatter(X_train,y_train,color='r') plt.plot(X_train,reg.predict(X_train),color='b') plt.show()
Python for Machine Learning - Part 10 - TimeStamp Split into Year, Month, Day, Dayofweek
 
08:08
Topic to be covered: How to split the TimeStamp into Year, Month, Day, Dayofweek, hours, minutes, seconds and microsecond. Code: #Split Timestamp into month, day, wekk_of_day, hour, minute, second, microsecond #suppose the date range start from 1st Jan, 2017 and there are 200000 entries with the frequency of every 1 second # with the frequency of every 1 second import pandas as pd import numpy as np ts_value = pd.date_range('01/01/2017',periods=200000,freq='15S') # Create the dataframe df df = pd.DataFrame() # Add a column with ts_value above df['Datetime'] = ts_value # Dervie a column year from the Column "Datetime" df['Year'] = df.Datetime.dt.year # Dervie a column month from the Column "Datetime" df['Month'] = df.Datetime.dt.month # Dervie a column day from the Column "Datetime" df['Day'] = df.Datetime.dt.day # Dervie a column week_day from the Column "Datetime" df['Weekday_name'] = df.Datetime.dt.weekday_name # Dervie a column Hour from the Column "Datetime" df['Hour'] = df.Datetime.dt.hour # Dervie a column minutes from the Column "Datetime" df['Minutes'] = df.Datetime.dt.minute # Dervie a column second from the Column "Datetime" df['Second'] = df.Datetime.dt.second # Dervie a column microsecond from the Column "Datetime" df['Microsecond'] = df.Datetime.dt.microsecond
Python for Machine Learning - Part 41 - Feature Engineering | Preprocessing
 
14:08
''' Topic to be covered - Feature Engineering SibSp - No of Siblings / Spouses aboard the Titanic Ship Parch - No of Parenst / Children aboard the Titanic Ship ''' import pandas as pd import warnings warnings.filterwarnings('ignore') train = pd.read_csv('train.csv') train['Family'] = train['SibSp'] + train['Parch'] + 1 train['FamSize'] = train['Family'] train['FamSize'].loc[train['Family'] == 1] = 'Small' train['Family'].loc[train['Family'] == 1] = 'Alone' import matplotlib.pyplot as plt import seaborn as sns fig, (fig1,fig2) = plt.subplots(1,2,figsize=(10,5)) sns.barplot(data=train, x='Family',y='Survived', ax=fig1) sns.countplot(data=train, x='Family',hue='Survived', ax=fig2) fig, (fig3,fig4) = plt.subplots(1,2,figsize=(10,4)) sns.barplot(data=train, x='FamSize',y='Survived', ax=fig3) sns.countplot(data=train, x='FamSize',hue='Survived', ax=fig4)

be dating site
speed dating cedar falls iowa
popular gay dating app uk
estj dating enfp
best online dating sites melbourne