Using the Needleman-Wunsch algorithm to draw evolutionary trees

A scoring function is denoted byOnce we define the scoring function for pairs of symbols from that alphabet, the score of the alignment between sequences, M, can be defined as an additive function of the symbol scores..So if we have a functioncharacterizing the cost of aligning two symbolsandthen we can define a function to compute the cost of an entire alignment as follows:where we sum over all positions of the alignment..We won’t go into details, but if you are interested I highly recommend the book Introduction to computational genomics A case studies approach..The choice of a scoring function is crucial in determining the score of the alignment, and this function should also be reflective of biological constraints.A convenient way of representing scoring functions is a substitution matrix..It shows the cost of replacing one letter (of either, the nucleotide or amino acid alphabet) with another letter or a gap..In our example, we’ll be using the BLOSUM50 substitution matrix:BLOSUM50 substitution matrix.Save the contents above to the file BLOSUM50.txt..Lets write the code that reads the BLOSUM50 matrix:Reading the BLOSUM50 file matrix.Ok, some code is written, lets continue with the theory now..We can now define what an optimal global alignment is.The optimal (global) alignment of two strings,andis defined as the alignmentthat maximizes the total alignment score over all possible alignments..The optimal alignment is often referred to asandis its alignment score.The naive approach in finding the optimal global alignment is calculating scores for all possible alignments and ranking them..The number of different alignments (with gaps) of two sequences of lengthisFortunately, two nice people came up with an algorithm to solve this problem, and the answer is dynamic programming and the use of the Needleman-Wunsch algorithm..The algorithm can be written as the following recursion:In words, the score of the alignment betweenandis equal to the maximum of the score of the alignment of the three prefixes plus the score for the consequent edit operation..We have all we need, now let’s write the code.The Needleman-Wunsch algorithm implementation.As you can see, the function receives two sequences, the blosum matrix and a penalty (number) as the input.. More details

Leave a Reply