Understanding Python List

Instead, we’ll be covering when lists should be used, and their nature as objects.

If you don’t know how to create or append to a list, how to retrieve items from a list, or what slice notation is, I direct you to the official Python tutorial, posthaste.

It can be found online at http://docs.

python.

org/3/tutorial/.

In Python, lists should normally be used when we want to store several instances of the same type of object; lists of strings or lists of numbers; most often, lists of objects we’ve defined ourselves.

Lists should always be used when we want to store items in some kind of order.

Often, this is the order in which they were inserted, but they can also be sorted by other criteria.

Lists are also very useful when we need to modify the contents: insert to, or delete from, an arbitrary location of the list, or update a value within the list.

Like dictionaries, Python lists use an extremely efficient and well-tuned internal data structure so we can worry about what we’re storing, rather than how we’re storing it.

Many object-oriented languages provide different data structures for queues, stacks, linked lists, and array-based lists.

Python does provide special instances of some of these classes, if optimizing access to huge sets of data is required.

Normally, however, the list data structure can serve all these purposes at once, and the coder has complete control over how they access it.

Don’t use lists for collecting different attributes of individual items.

We do not want, for example, a list of the properties a particular shape has.

Tuples, named tuples, dictionaries, and objects would all be more suitable for this purpose.

In some languages, they might create a list in which each alternate item is a different type; for example, they might write [‘a’, 1, ‘b’, 3] for our letter frequency list.

They’d have to use a strange loop that accesses two elements in the list at once or a modulus operator to determine which position was being accessed.

Don’t do this in Python.

We can group related items together using a dictionary, as we did in the previous section, or using a list of tuples.

Here’s a rather convoluted counter-example that demonstrates how we could perform the frequency example using a list.

It is much more complicated than the dictionary examples and illustrates the effect of choosing the right (or wrong) data structure can have on the readability of our code.

This is demonstrated as follows:import string CHARACTERS = list(string.

ascii_letters) + [" "] def letter_frequency(sentence): frequencies = [(c, 0) for c in CHARACTERS] for letter in sentence: index = CHARACTERS.

index(letter) frequencies[index] = (letter,frequencies[index][1]+1) return frequenciesThis code starts with a list of possible characters.

The string.

ascii_letters attribute provides a string of all the letters, lowercase and uppercase, in order.

We convert this to a list and then use list concatenation (the + operator causes two lists to be merged into one) to add one more character, space.

These are the available characters in our frequency list (the code would break if we tried to add a letter that wasn’t in the list, but an exception handler could solve this).

The first line inside the function uses a list comprehension to turn the CHARACTERS list into a list of tuples.

List comprehensions are an important, non-object-oriented tool in Python.

Then, we loop over each of the characters in the sentence.

We first look up the index of the character in the CHARACTERS list, which we know has the same index in our frequencies list, since we just created the second list from the first.

We then update that index in the frequencies list by creating a new tuple, discarding the original one.

Aside from garbage collection and memory waste concerns, this is rather difficult to read!Like dictionaries, lists are objects too, and they have several methods that can be invoked upon them.

Here are some common ones:The append(element) method adds an element to the end of the listThe insert(index, element) method inserts an item at a specific positionThe count(element) method tells us how many times an element appears in the listThe index()method tells us the index of an item in the list, raising an exception if it can’t find itThe find()method does the same thing but returns -1 instead of raising an exception for missing itemsThe reverse() method does exactly what it says — turns the list aroundThe sort() method has some rather intricate object-oriented behaviours, which we’ll cover nowSorting listsWithout any parameters, the sort will generally do as expected.

If it’s a list of strings, it will place them in alphabetical order.

This operation is case sensitive so all capital letters will be sorted before lowercase letters; that is, Z comes before a.

If it’s a list of numbers, they will be sorted in numerical order.

If a list of tuples is provided, the list is sorted by the first element in each tuple.

If a mixture containing unsortable items is supplied, the sort will raise a TypeError exception.

If we want to place objects we define ourselves into a list and make those objects sortable, we have to do a bit more work.

The special __lt__ method, which stands for less than, should be defined on the class to make instances of that class comparable.

The sort method on the list will access this method on each object to determine where it goes in the list.

This method should return True if our class is somehow less than the passed parameter, and False otherwise.

Here’s a rather silly class that can be sorted based on either a string or a number:class WeirdSortee: def __init__(self, string, number, sort_num): self.

string = string self.

number = number self.

sort_num = sort_numdef __lt__(self, object): if self.

sort_num: return self.

number < object.

number return self.

string < object.

stringdef __repr__(self): return f"{self.

string}:{self.

number}"The __repr__ method makes it easy to see the two values when we print a list.

The __lt__method’s implementation compares the object to another instance of the same class (or any duck-typed object that has a string, number, and sort_num attributes; it will fail if those attributes are missing).

The following output illustrates this class in action when it comes to sorting:>>> a = WeirdSortee('a', 4, True)>>> b = WeirdSortee('b', 3, True)>>> c = WeirdSortee('c', 2, True)>>> d = WeirdSortee('d', 1, True)>>> l = [a,b,c,d]>>> l[a:4, b:3, c:2, d:1]>>> l.

sort()>>> l[d:1, c:2, b:3, a:4]>>> for i in l:.

i.

sort_num = False.

>>> l.

sort()>>> l[a:4, b:3, c:2, d:1]The first time we call to sort, it sorts by numbers because sort_num is True on all the objects being compared.

The second time, it sorts of letters.

The __lt__ method is the only one we need to implement to enable sorting.

Technically, however, if it is implemented, the class should normally also implement the similar __gt__, __eq__, __ne__, __ge__, and __le__methods so that all of the <, >, ==, !=, >=, and <= operators also work properly.

You can get this for free by implementing __lt__ and __eq__, and then applying the @total_ordering class decorator to supply the rest:from functools import total_ordering @total_ordering class WeirdSortee: def __init__(self, string, number, sort_num): self.

string = string self.

number = number self.

sort_num = sort_num def __lt__(self, object): if self.

sort_num: return self.

number < object.

number return self.

string < object.

string def __repr__(self): return f"{self.

string}:{self.

number}" def __eq__(self, object): return all(( self.

string == object.

string, self.

number == object.

number, self.

sort_num == object.

number ))This is useful if we want to be able to use operators on our objects.

However, if all we want to do is customize our sort orders, even this is overkill.

For such a use case, the sort method can take an optional key argument.

This argument is a function that can translate each object in a list into an object that can somehow be compared.

For example, we can use str.

lower as the key argument to perform a case-insensitive sort on a list of strings, as can be seen in the following:>>> l = ["hello", "HELP", "Helo"]>>> l.

sort()>>> l['HELP', 'Helo', 'hello']>>> l.

sort(key=str.

lower)>>> l['hello', 'Helo', 'HELP']Remember, even though lower is a method on string objects, it is also a function that can accept a single argument, self.

In other words, str.

lower(item) is equivalent to item.

lower().

When we pass this function as a key, it performs the comparison on lowercase values instead of doing the default case-sensitive comparison.

There are a few sort key operations that are so common that the Python team has supplied them so you don’t have to write them yourself.

For example, it is common to sort a list of tuples by something other than the first item in the list.

The operator.

itemgettermethod can be used as a key to do this:>>> from operator import itemgetter>>> l = [('h', 4), ('n', 6), ('o', 5), ('p', 1), ('t', 3), ('y', 2)]>>> l.

sort(key=itemgetter(1))>>> l[('p', 1), ('y', 2), ('t', 3), ('h', 4), ('o', 5), ('n', 6)]The itemgetter function is the most commonly used one (it works if objects are dictionaries, too), but you will sometimes find use for attrgetter and methodcaller, which return attributes on an object and the results of method calls on objects for the same purpose.

Refer to the operator module documentation for more information.

SetsLists are extremely versatile tools that suit many container object applications.

But they are not useful when we want to ensure that objects in list are unique.

For example, a song library may contain many songs by the same artist.

If we want to sort through the library and create a list of all the artists, we would have to check the list to see whether we’ve added the artist already, before we add them again.

This is where sets come in.

Sets come from mathematics, where they represent an unordered group of (usually) unique numbers.

We can add a number to a set five times, but it will show up in the set only once.

In Python, sets can hold any hashable object, not just numbers.

Hashable objects are the same objects that can be used as keys in dictionaries; so again, lists and dictionaries are out.

Like mathematical sets, they can store only one copy of each object.

So if we’re trying to create a list of song artists, we can create a set of string names and simply add them to the set.

This example starts with a list of (song, artist) tuples and creates a set of the artists:song_library = [ ("Phantom Of The Opera", "Sarah Brightman"), ("Knocking On Heaven's Door", "Guns N' Roses"), ("Captain Nemo", "Sarah Brightman"), ("Patterns In The Ivy", "Opeth"), ("November Rain", "Guns N' Roses"), ("Beautiful", "Sarah Brightman"), ("Mal's Song", "Vixy and Tony"),]artists = set()for song, artist in song_library: artists.

add(artist)print(artists)There is no built-in syntax for an empty set as there is for lists and dictionaries; we create a set using the set() constructor.

However, we can use the curly braces (borrowed from dictionary syntax) to create a set, so long as the set contains values.

If we use colons to separate pairs of values, it’s a dictionary, as in {‘key’: ‘value’, ‘key2’: ‘value2’}.

If we just separate values with commas, it’s a set, as in {‘value’, ‘value2’}.

Items can be added individually to the set using its add method.

If we run this script, we see that the set works as advertised:{'Sarah Brightman', "Guns N' Roses", 'Vixy and Tony', 'Opeth'}If you’re paying attention to the output, you’ll notice that the items are not printed in the order they were added to the sets.

Sets are inherently unordered due to a hash-based data structure for efficiency.

Because of this lack of ordering, sets cannot have items looked up by index.

The primary purpose of a set is to divide the world into two groups: things that are in the set, and things that are not in the set.

It is easy to check whether an item is in a set or to loop over the items in a set, but if we want to sort or order them, we have to convert the set to a list.

This output shows all three of these activities:>>> "Opeth" in artistsTrue>>> for artist in artists:.

print("{} plays good music".

format(artist)).

Sarah Brightman plays good musicGuns N' Roses plays good musicVixy and Tony play good musicOpeth plays good music>>> alphabetical = list(artists)>>> alphabetical.

sort()>>> alphabetical["Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony']While the primary feature of a set is uniqueness, that is not its primary purpose.

Sets are most useful when two or more of them are used in combination.

Most of the methods on the set type operate on other sets, allowing us to efficiently combine or compare the items in two or more sets.

These methods have strange names, since they use the terminology used in mathematics.

We’ll start with three methods that return the same result, regardless of which is the calling set and which is the called set.

The union method is the most common and easiest to understand.

It takes a second set as a parameter and returns a new set that contains all elements that are in either of the two sets; if an element is in both original sets, it will, of course, only show up once in the new set.

Union is like a logical or operation.

Indeed, the | operator can be used on two sets to perform the union operation, if you don’t like calling methods.

Conversely, the intersection method accepts a second set and returns a new set that contains only those elements that are in both sets.

It is like a logical and operation, and can also be referenced using the & operator.

Finally, the symmetric_difference method tells us what’s left; it is the set of objects that are in one set or the other, but not both.

The following example illustrates these methods by comparing some artists preferred by two different people:first_artists = { "Sarah Brightman", "Guns N' Roses", "Opeth", "Vixy and Tony",}second_artists = {"Nickelback", "Guns N' Roses", "Savage Garden"}print("All: {}".

format(first_artists.

union(second_artists)))print("Both: {}".

format(second_artists.

intersection(first_artists)))print( "Either but not both: {}".

format( first_artists.

symmetric_difference(second_artists) ))If we run this code, we see that these three methods do what the print statements suggest they will do:All: {'Sarah Brightman', "Guns N' Roses", 'Vixy and Tony','Savage Garden', 'Opeth', 'Nickelback'}Both: {"Guns N' Roses"}Either but not both: {'Savage Garden', 'Opeth', 'Nickelback','Sarah Brightman', 'Vixy and Tony'}These methods all return the same result, regardless of which set calls the other.

We can say first_artists.

union(second_artists) or second_artists.

union(first_artists) and get the same result.

There are also methods that return different results depending on who is the caller and who is the argument.

These methods include issubset and issuperset, which are the inverse of each other.

Both return a bool.

The issubset method returns True, if all of the items in the calling set are also in the set passed as an argument.

The issuperset method returns True if all of the items in the argument are also in the calling set.

Thus, s.

issubset(t) and t.

issuperset(s) are identical.

They will both return True if t contains all the elements in s.

Finally, the difference method returns all the elements that are in the calling set, but not in the set passed as an argument; this is like half a symmetric_difference.

The differencemethod can also be represented by the — operator.

The following code illustrates these methods in action:first_artists = {"Sarah Brightman", "Guns N' Roses", "Opeth", "Vixy and Tony"} bands = {"Guns N' Roses", "Opeth"} print("first_artists is to bands:") print("issuperset: {}".

format(first_artists.

issuperset(bands))) print("issubset: {}".

format(first_artists.

issubset(bands))) print("difference: {}".

format(first_artists.

difference(bands))) print("*"*20) print("bands is to first_artists:") print("issuperset: {}".

format(bands.

issuperset(first_artists))) print("issubset: {}".

format(bands.

issubset(first_artists))) print("difference: {}".

format(bands.

difference(first_artists)))This code simply prints out the response of each method when called from one set on the other.

Running it gives us the following output:first_artists is to bands:issuperset: Trueissubset: Falsedifference: {'Sarah Brightman', 'Vixy and Tony'}********************bands is to first_artists:issuperset: Falseissubset: Truedifference: set()The difference method, in the second case, returns an empty set, since there are no items in bands that are not in first_artists.

The union, intersection, and difference methods can all take multiple sets as arguments; they will return, as we might expect, the set that is created when the operation is called on all the parameters.

So, the methods on sets clearly suggest that sets are meant to operate on other sets, and that they are not just containers.

If we have data coming in from two different sources and need to quickly combine them in some way, so as to determine where the data overlaps or is different, we can use set operations to efficiently compare them.

Or, if we have data incoming that may contain duplicates of data that has already been processed, we can use sets to compare the two and process only the new data.

Finally, it is valuable to know that sets are much more efficient than lists when checking for membership using the in keyword.

If you use the value in container syntax on a set or a list, it will return True if one of the elements in container is equal to value, and False otherwise.

However, in a list, it will look at every object in the container until it finds the value, whereas in a set, it simply hashes the value and checks for membership.

This means that a set will find the value in the same amount of time no matter how big the container is, but a list will take longer and longer to search for a value as the list contains more and more values.

.

. More details

Leave a Reply