8 Tricks in Python to accelerate your Data Science development

8 Tricks in Python to accelerate your Data Science developmentAnis AyariBlockedUnblockFollowFollowingFeb 26Let’s be short, let’s be effective.

Less words, more code (and more memes illustration).

It is what you are looking for right ? ;)Python Language tricksCheck a condition in one line of code (assert)As example we will check if the training and the test function has the same columns (an usual error when you predict your test dataset and you did not apply the same features engineering :))Common Python :label = df_train['label_column']df_train = df_train.

drop('label_column',axis=1)if df_train.

column.

tolist() != df_test.

column.

tolist(): print('Train and test column do not match') print(sys.

exit())Python trick:assert df_train.

column.

tolist() == df_test.

column.

tolist(),'Train and test column do not match'Loop in one line of code (one-liners)As example we will fill a list of even numbers from 0 to 100 and fill 0 for ood numbers (ie.

[0,0,2,0,4,0…])Common Python:my_awesome_list = []for i in range [0,100]: if i%2 = 0: my_awesome_list.

append(i) else: my_awesome_list.

append(0)my_awesome_list = [0,0,2,0,3,4]Python trick:my_awesome_list = [i if i%2==0 else 0 for i in range(0,100) ]my_awesome_list = [0,0,2,0,3,4]Create a function without creating a function (lambda)Calculate the double of each element from a list:Common Python:a = [2,2,2]for i in range(0,len(a)): a[i] = a[i]*2[4, 4, 4]Python trick:a = [2,2,2]double = lambda x : x*2a = [double(a_) for a_ in a][4, 4, 4]Transform a list using an other list without loop (map)Transform a list of number in a full negative list.

Common Python :a = [1,1,1,1]for i in range(0,len(a)): a[i] = -1 * a[i][-1, -1, -1, -1]Python trick:a = [1,1,1,1]a = list(map(lambda x: -1*x, a))[-1, -1, -1, -1]Do not abuse of simple syntax in first sight (try: except and global)A common mistake I notice in the code of lof of of Junior Python developer is to develop a code using so much try/except and global syntax , for example :def compare_two_list(list_a,list_b): global len_of_all_list try: for i in range(0,len_of_all_list): if list_a[i] != list_b[i]: print('error') except: print('error')global len_of_all_listlen_of_all_list = 100list_a = [1]*len_of_all_listlen_of_all_list = len_of_all_list+1list_b = [1]*len_of_all_listcompare_two_list(list_a,list_b)'error'Debugging such code is really complicated.

Use try:except when you need to manipulate something in your code by handling a specific exception.

For example, here a way to use “try: except” statement to deal with our issue is to ignore all index after the maximum index of list_a :def compare_two_list(list_a,list_b): global len_of_all_list try: for i in range(0,len_of_all_list): if list_a[i] != list_b[i]: print('error') except IndexError: print('It seems that the two lists are different sizes.

They was similar until index {0}'.

format(len_of_all_list-1)) return global len_of_all_listlen_of_all_list=100list_a = [1]*len_of_all_listlen_of_all_list = len_of_all_list+1list_b = [1]*len_of_all_listcompare_two_list(list_a,list_b)It seems that the two lists are different sizes.

They was similar until index 101.

Data Science libraries tricksFill a DataFrame column based on condition (np.

where())As example we will fill a column based on a condition, if the number of persons is equal to 1 its mean that the person is alone, otherwise the person is not alone (for simplification we take the assumptions that 0 does not exist in the column ‘number of_persons’ 🙂 )Common Python:df['alone'] = ''df.

loc[df['number_of_persons'] == 1]]['alone'] = 'Yes'df.

loc[['number_of_persons'] != 1]]['alone'] = 'No'Python Trick:df['alone'] = np.

where(df['number_of_persons']==1,'Yes','No')Get all numerical columns (pd.

DF().

select_dtypes(include=[]))For most of Machine Learning algorithms we need to give them numerical values.

Pandas DataFrame offer a simple way to select those columns.

You can also use select_dtypes to select any kind of data type you want as object, categorical.

Common Python:df_train.

info()<class 'pandas.

core.

frame.

DataFrame'>Index: 819 entries, 0_0 to 2_29Data columns (total 4 columns):image_id 812 non-null objectage 816 non-null int64gender 819 non-null objectnumber_of_persons 734 non-null float64dtypes: float64(10), object(3)memory usage: 89.

6+ KBnumerical_column = ['age','number_of_persons']X_train = df_train[numerical_column]Python Trick:X_train = df_train.

select_dtypes(include=['int64','float64'])Get the inverse of a condition in a DataFrame selection (.

[~condition])As example we will create two columns for adult and minor people.

Common Python:minor_check = df.

age.

isin(list(range(0,18)))df['minor'] = df[minor_check]df['adult’] = df[not (minor_check)]ValueError: The truth value of a Series is ambiguous.

Use a.

empty, a.

bool(), a.

item(), a.

any() or a.

all().

Python Master:We know that .

isna() and isin() can retrieve nan values and values in a list inside a dataframe column.

But isNOTna() ans isNOTin() does not exists, here come the sign ~ (you can use np.

invert() as well :)).

minor_check = df.

age.

isin(list(range(0,18)))df['minor'] = df[minor_check]df['adult’] = df[~minor_check]Hoping these tricks will help, please do not hesitate to share yours ! :).. More details

Leave a Reply