Chaos Game Representation of a genetic sequence

If we defineas a subsequence in the gene structure, the probability of the occurrence ofiswhereis the length of the sequenceis the length of the k-mer andis the number of occurrences of the k-mer.Now lets write some Python code that calculates this for us.Calculating the probabilities.Now the basics for the chaos game representation are done..Now lets explain what CGR is..CGR is a generalized scale-independent Markov probability table for the sequence, and oligomer tables can be deduced from CGR image..CGR is generated by the following procedure:CGR pseudo code.The program we are trying to write will create an image showing the abundance of all k-mers (oligonucleotides of length k) in a given sequence..For example, for tetramers (k=4), the resulting image will be composed ofboxes, each representing an oligomer..Oligomer name and abundance is written within these boxes, and abundance is also visualized with the box color, from white (none) to black (highly frequent)..This k-mer table is alternatively known as the FCGR (frequency matrices extracted from Chaos Game Representation).A k-mer table example for CGR.If you are asking yourself why 4 to the power of 4, this is basics probability and statistics..If we have a k-mer of length 4, and four possible values for each spot (A, C, G, T), then we havepossible combinations..Basically, we are doing a two dimensional array..We can calculate the size of it by usingwhereis the length of the k-mer..Now we need to calculate the position of the oligomers..Position of the oligomers can be recursively located as follows:For each letter in an oligomer, a box is subdivided into four quadrants, where A is upper left, T is lower right, G is upper right, and C is lower left.. More details

Leave a Reply