The gist of Python

Chances are, someone has made something that solves what you are trying to do.

Moreover, as Python is open source and the community is very sharing, it is highly encouraged to borrow, tweak, break and improve other people’s code.

Ok, basic overview of the language covered, but please go and read the wiki page of Python if you are further interested in what makes Python tick, where it came from and where it is used.

You might be surprised to find that a lot of big applications like Instagram, Spotify, Netflix, Reddit and YouTube all have Python somewhere in the mix.

Data TypesPython has a few core data types which everything else builds on, and I like to think of data types as layers in the OSI model.

If you are not familiar with the OSI model, it’s a conceptual abstraction of layers that enable protocol and interface compatibility.

A quick example of the OSI model is a fibre cable serving as layer 1 (the physical layer) and the Ethernet protocol (layer 2) running on top of layer 1.

The abstraction occurs as the Ethernet protocol doesn’t care what layer 1 is, it could be fibre, microwave even smoke signals.

The idea is that higher level layers assume that the lower layers do their job well and you can build upwards from there.

Similarly, we can build much more complex data types from the base data types in Python — and take a guess, all of these data types are also objects with a suite of built-in methods, as with str.

The basic Python data types are:IntegersFloatsComplex NumbersStringsBooleansBelow is an example of each.

>>> my_int = 300>>> my_float = 300.

3>>> my_complex = 1 + 3j>>> my_string = "We've seen this one before">>> my_bool = TrueWe also get composite data types, some of which are:ListsDictionariesSetsTuplesThese composite data types are all “array’s” of numbers, but slightly different.

Unlike other languages, Python doesn’t mind if the types within an array are the same.

In other words, we can put strings and booleans and complex numbers all in the same list and iterate over them.

The automatic data type choice brings us to golden rule number 4: Python infers data types, which can be a blessing or a curse, but you don’t ever have to worry about telling Python you are about to declare an integer, it will infer it.

Sometimes this is desirable, sometimes not, for the times that it is not you can be explicit about what type you want.

So composite data types… Below is an example of each of the 4 mentioned above.

>>> my_list = [1, True, 'some words']>>> my_tuple = tuple(my_list)>>> my_set = set(my_list)>>> my_dictionary = dict({'elephant':'A large, five-toed animal.

','go-cart':'A small carriage for young children to ride in.

'})The difference between these 4 array types is the following:Lists are the most generally used, and they are just a list of things as the name suggests, more importantly, lists are mutable — that means that you can change the 2 value if you would so please, my_list[2] = ‘other words’.

Tuples are almost identical to lists, except for one critical difference — they are immutable.

Immutable means that you cannot change the second value in the tuple if you want to.

Executing my_tuple[2] = ‘other words’ will raise an error.

Sets are different as they only contain unique values; in other words, if we had a list with 4 values: [1,2,3,1] casting it to a set results in {1,2,3}.

Note that the order of values in sets aren’t guaranteed and might change if you cast a list to a set.

Lastly, dictionaries are key-value stores.

They have a key, in our example the word, and an associated value, in our example the definition of the word.

Note that you can’t have multiple entries of the same key as only the last declared key-value pair is stored.

Ok, that wraps up most of the basics regarding base data structures in Python.

Again all of these are objects and have their associated attributes and methods (the nice built-in functions like str.

split(), and you can inspect these attributes by running dir(my_list) for example.

NomenclatureSo I’ve been irritating some seasoned Pythoners by calling some objects that are attributes methods and also calling libraries packages.

So this section elaborates a bit on what the “correct” names for objects are, but when you are just starting with Python — think of everything as an object — no seriously!.Everything!A function is a function that you declare, for example:def my_function(argument_1, another_arugment): """ A function that adds two arguments together """ return argument_1 + another_argumentNote the whitespace grouping all the internals of the function together.

A class is a blueprint created by a programmer for an object.

A class defines a set of attributes that characterise an object that gets instantiated from this class, for example:class Person(): def __init__(self, name, surname): self.

name = name self.

surname = surname def get_full_name(self): return self.

name + ' ' + self.

surnameHere we’ve created a blueprint for the Person class, notice the capital letter — I’ll touch on that a bit later.

We can now create various people using our Person class; each instance of these Person classes is called an object.

An object is the realised version of the class, where the class is just the blueprint defined in the program, for example:p1 = Person('John', 'Doe')p2 = Person('Miley', 'Cyrus')To be technically correct only p1 and p2 are objects.

We can now do interesting things with these initiated objects, like ask for their name, surname or full name, for example:>>> print(p1.

name)John>>> print(p2.

get_full_name())Miley CyrusNow for some other technical jargon.

You see the get_full_name() function that we declared in our class.

We’re not allowed to call that a function anymore.

Because it is inside a class, the correct term for it is a method.

Similarly, we don’t say name and surname are variables, because the variables live inside a class we refer to them as attributes.

Some other terms you might run into are module, package, library and framework which I briefly discuss below.

A module is a file which contains python functions, global variables etc.

It is nothing but a .

py file which has python executable code/statement.

A package is a namespace which contains multiple package/modules.

It is a directory which contains a particular file — __init__.

py.

A library is a collection of various packages.

There is no difference between package and python library conceptually.

A framework is a collection of various libraries which architects the code flow.

However, when you are starting, don’t worry about all of this jargon, call everything an object and learn what is essential, but now you know what you should call things if you’d like to be technically correct.

PackagesI mentioned above that someone has probably built whatever you need in a package or library somewhere.

But where?.Moreover, how can you leverage all the hard work other people have put in?.Well by installing packages.

Installing packages can be done in two ways using pip or using conda.

The aforementioned is the pure Python place to install packages.

PIP stands for Package Installer for Python and is itself a python package that you can install — see how everything is just an object.

Conda, on the other hand, is an attempt to aid package management.

Short for Anaconda, the project aims to bundle commonly used Python resources together into a bigger snake.

If you are starting, I’d advise you to download Anaconda as this comes with all the things you’ll need to get going out of the box and you don’t have to fiddle with environment variables and such.

PEPSPython Enhancement Proposals are conventions are rules that I advice you start following from the start.

You can name your variables or your classes anything, but if everybody sticks to the conventions, then it is obvious what you are talking about without any comments.

One of these PEPs is PEP20, which is called The Zen of Python, and you can print it out in any Python interpreter by running the following code:>>> import thisExecuting the above command prints out PEP20, which reads:Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one– and preferably only one –obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea — let's do more of those!For me, PEP20 summarises the culture surrounding Python and gives some guidance when you have to make a difficult design choice.

You can find all the PEPs here, but one I’d like to take some time with is PEP8, titled the Style Guide for Python Code.

Without going into too much detail, this gist of PEP8 is:code is read much more often than it is written.

The guidelines provided in PEP8 are intended to improve the readability of code and make it consistent across the broad spectrum of Python code.

As PEP 20 says, “Readability counts”.

Below are some highlights of code styling conventions:Indentation: use 4 spaces per indentation level.

Tabs or Spaces?.Spaces are the preferred indentation method.

Maximum Line Length?.Limit all lines to a maximum of 79 characters.

Blank Lines?.Surround top-level function and class definitions with two blank lines.

Names to Avoid?.Never use the characters ‘l’ (lowercase letter el), ‘O’ (uppercase letter oh), or ‘I’ (uppercase letter eye) as single character variable names.

Package and Module Names?.Modules should have short, all-lowercase names.

Class Names?.Class names should normally use the CapWords convention.

Function and Variable Names?.Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Constants?.Constants are usually defined on a module level and written in all capital letters with underscores separating words.

Examples include MAX_OVERFLOW and TOTAL.

Data Science StackRight, you’ve come a long way.

Hopefully, by now you are more comfortable with Python and see all the subtle nuances that can aid you in telling you more about what you are working with, without being explicit about it.

For example, if I see from sklearn.

preprocessing import StandardScaler I intuitively know that StandardScaler is a class that I have to initialise, only because it is written as CapWords.

Moving along to the tools you’ll need to do some data ingestion, manipulation and visualisation.

Shockingly there are packages for all of these tasks, and the 5 main packages I want to mention here are:numpymatplotlibpandasseabornpickleNumPy, short for Numerical Python, is the fundamental package for scientific computing with Python.

It contains among other things:a powerful N-dimensional array objectsophisticated (broadcasting) functionstools for integrating C/C++ and Fortran codeuseful linear algebra, Fourier transform, and random number capabilitiesMatplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Pandas is a library providing high-performance, easy-to-use data structures and data analysis tools in Python.

The primary data type in pandas is a DataFrame which is basically a table consisting of rows and columns.

Moreover, as always, DataFrames have some fantastic methods that are bound to change the way you manipulate data forever.

Seaborn is a data visualization library built on top of matplotlib.

It provides a high-level interface for drawing attractive and informative statistical graphics.

As Seaborn is a pretty wrapper built on top of matplotlib, you can always use lower level matplotlib functions to fine tune your Seaborn if you require some funky additions.

The pickle module implements binary protocols for serialising and de-serialising a Python object structure.

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation.

Say you’ve worked a lot on cleaning a DataFrame and you would like to make a checkpoint and save the current state of your DataFrame — datatypes of each column included — then you’d pickle up the DataFrame and reuse it later on.

The gist of this section is: If you are interested in data ingestion, manipulation and visualisation, familiar yourself with the following package imports and watch a few tutorial videos on these packages.

import numpy as npimport matplotlib.

pyplot as pltimport pandas as pdimport seaborn as snsimport pickle as pckIDEAn integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development, and there are many options for Python, but what would I advise?Anaconda ships with 2 great IDE’s, Jupyter notebooks and Spyder, and I’d recommend you start using them to get coding right away.

There are however times where one is better suited than the other.

Ultimately it comes down to personal preference and workflow, but I like to do the exploratory part of coding in a notebook and later on copy and reduce the notebook sandbox code into a productionable script.

I prefer to use atom instead of Spyder, but Spyder will get the job.

Concluding ThoughtsI hope that this blog has given you some insights into why Python is such a popular language nowadays.

This blog hasn’t even scratched the surface of all the subtle easter eggs locked away in Python, but these are best discovered on your own.

Ultimately Python is more than just a programming language; it is a means to express your creativity through code by abstracting away a lot of the repetitive, tedious coding tasks found in lower level languages.

Don’t get me wrong, there is a place for optimised C code, but what Python loses in performance, it makes up ten fold in ease of use.

.. More details

Leave a Reply