Synopsis_
The work presented here explored the aesthetic possibilities of using Helitron Transposon DNA sequences from the maize genome as raw material for the creation of algorithmic art. Complex network graphs constructed from Wikipedia pages were re-mixed based on the DNA sequence of a particular Helitron element belonging to the Cornucopius
family after they were aligned. The work extends general efforts from the artist in integrating art with science, and strengthen in particular his recently developed artistic approach at integrating genomics with algorithmic art, a novel avenue of work termed 'Geometric And Genomic AbstractionISM'.
Helitron Transposons explained: what are they?_
Eukaryotes genomes harbor transposable elements (TEs) that are able of intragenomic multiplication by a mechanism that transfers a DNA segment from one genomic location to another. TEs can be divided into retrotransposons, which multiply via reverse transcription; and DNA transposons, which are transposed without the need of RNA intermediary molecules. DNA transposons proliferate through the utilization of single or double-stranded DNA intermediary molecules. In eukaryotes, DNA transposons can mainly be divided into three classes: (1) those in which the excision mechanism involve double-stranded DNA and reinsert to a different location in the genome ('cut-and-paste' transposons); (2) those that transpose via a rolling-circle replication such as Helitrons; (3) and Polintons/Mavericks that are believed to replicate using a self-encoded DNA polymerase.
Helitrons are particularly interesting because they were only recently discovered by computational means in 2001 by Kapitonov and Jurka. They don't harbor the structural hallmarks of other DNA transposons such as Terminal Inverted Repeats (TIRs) and Target Site Duplications (TSDs). Helitrons instead harbor conserved TC and CTAG sequence contexts at 5' and 3' termini, respectively; palindromes (16 to 20bp 'hairpin loops') located 10 to 15bp upstream of the 3' terminus; and flanking A and T host nucleotides at the 5' and 3' termini, respectively. A remarkable feature of Helitrons is their capacity to capture gene sequences, with Helitrons shown to be causative agents of allelic variability and evolutionary importance.
Helitron transposons were proposed to move via a rolling-circle replication (RCR) because autonomous Helitrons encode RepHel protein domains related to the prokaryotic Rep protein involved in RCR. However, most Helitrons found in plant genomes are non-autonomous and thus encode non-functional RepHel proteins. Furthermore, there are Helitrons that are agenic (they don't contain captured genes) such as the elements that belong to the Cornucopious family recently discovered in the maize genome.
How this collaboration came to be?_
The Cornucopious family of Helitron-related sequences were identified by Dr. Chunguang Du (collaborator in this project) and colleagues, who developed and used Helitron discovery algorithms such as HelitronFinder and HelitronScanner. I had the opportunity to meet Dr. Du while I was a graduate student at the Waksman Institute at Rutgers University [2006-2013], our labs used to have joint laboratory meetings each week and that's when I learned about Dr. Du's research project on Helitron discovery and annotation in the maize genome. Several years later (6 years to be specific) we decided to join efforts in exploring the use of genomic/genetic knowledge pertaining Helitrons in maize as raw material for computational art.
I've previously explored the use of maize genome concepts to create computational art, in particular the transformation of glitch art by subjecting the image to processes analogous to those shaping the evolution of the maize genome, the creation of visual art and experimental sound by addressing the reduction in genome diversity at the ba-1 gene in maize ought to plant domestication, and the auditory rendering of CG and CCG sequence context variation at the VERNALIZATION 1 gene in maize. All these projects were created as means to develop the aesthetic, discursive and materialistic component of a new artistic disciplined termed by the author as 'Geometric And Genomic AbstractionISM' (GAGAISMO). As a discipline, it encompasses and reflects upon the practice of using genome data as raw material for art, either directly by co-opting scientific principles and tools; or indirectly as an inspirational source. As bioinformatics includes a set of computational tools used to access and manipulate genome data for research purposes, GAGAISMO includes a set of computational expressive tools used to access and manipulate genome data for artistic purposes. It is geometric abstractionism guided by genomics and enabled by computers.
In the work presented here, I explored the possibility of maize Helitrons as departure point for creation of visual works using algorithms. I was interested in utilizing network graphs and their re-mixing as primary elements for visual impact and aesthetic investigation.
Construction of network graphs from Wikipedia pages related to Helitrons_
I recently explored the use of NetworkX, a Python library for complex network construction, in conjunction with Gephi for network visualization and analysis, to visualize the application of Natural Language Processing algorithms to Facebook messages exchanged by the author with his friends during a two year period. I decided then I should keep exploring further the construction and visualization of networks and used them as raw material for visual art creation.
Node and edge data from Wikipedia were automatically collected with the use of the Wikipedia Python module. Four concepts related to Helitrons were used as 'seed page' from which to build the network. Wikipedia pages were treated as nodes and the links between the pages were treated as network edges. Snowball sampling was conducted to discover all nodes and edges of interest. Wikipedia entries used as 'seed page' were the following:
Transposable Element_ https://en.wikipedia.org/wiki/Transposable_element
Vladimir Kapitonov_ https://en.wikipedia.org/wiki/Vladimir_Kapitonov
Jerzy Jurka_ https://en.wikipedia.org/wiki/Jerzy_Jurka
Processing involved the seed node itself and its immediate neighbors (layer 0 and 1), and as result a directed NetworkX graph was created. It is a directed graph because the edges representing HTML links are inherently directed: a link from page A to page B does not necessary imply a reciprocal link (from page B to page A).
As network measure, the author focused his attention to node indegrees (the number of edges directed into the node); with the indegree of a node equaling the number of HTML links pointing towards a respective page. If a Wikipedia page has lots of links to it, the content of the page must be of wide interest. Nodes with only one connection were intentionally removed from the graph to facilitate the visualization and make the graph more compact. The output of NetworkX was imported into Gephi via GraphML files and was visualized accordingly (Figure 1 to 4).
Figure 1. Network visualization of Wikipedia pages relating to 'Transposable Element'. Node and label font sizes represent the indegrees. Color differences represent community structure of related pages within the network. The network graph is composed of 6,512 nodes and 26,493 edges (4.1 edges per node on average). The top 25 most connected nodes in the graph are as follow: 93 Genome - 90 Dna - 83 Genetics - 73 Gene - 71 Transposable Element - 66 Eukaryote - 60 Protein - 57 Mutation - 56 Chromosome - 54 Retrotransposon - 51 Virus - 45 Transposon - 44 Bacteria - 42 Rna - 41 P-Element - 41 Horizontal Gene Transfer - 40 Repeated Sequence (Dna) - 38 Dna Replication - 38 Prokaryote - 38 Helitron (Biology) - 38 Cell Nucleus - 38 Citeseerx - 37 Alu Element - 37 Microsatellite (Genetics) - 37 Long Terminal Repeat.
Figure 2. Network visualization of Wikipedia pages relating to 'Helitron'. Node and label font sizes represent the indegrees. Color differences represent community structure of related pages within the network. The network graph is composed of 1,528 nodes and 4,145 edges (2.7 edges per node on average). The top 25 most connected nodes in the graph are as follow: 20 Dna - 18 Gene - 17 Genome - 16 Eukaryote - 15 Protein - 14 Rna - 13 Genetics - 13 Chromosome - 12 Molecular Biology - 11 Citeseerx - 11 Virus - 11 Intron - 10 Base-Pair - 10 Nucleotide - 10 Saccharomyces Cerevisiae - 10 Evolution - 10 Enzyme - 10 Bacteria - 9 Phenotype - 9 Mutation - 9 Integrated Authority File - 9 Prokaryote - 9 Cell (Biology) - 9 Amino-Acid - 9 Organism.
Figure 3. Network visualization of Wikipedia pages relating to 'Vladimir Kapitonov'. Node and label font sizes represent the indegrees. Color differences represent community structure of related pages within the network. The network graph is composed of 1,391 nodes and 3,936 edges (2.8 edges per node on average). The top 25 most connected nodes in the graph are as follow: 18 Taxonomy (Biology) - 15 Animal - 14 Genome - 13 Wikidata - 12 Encyclopedia Of Life - 12 Integrated Authority File - 12 Wikispecies - 11 Protein - 11 National Center For Biotechnology Information - 11 Integrated Taxonomic Information System - 11 Inaturalist - 11 Eukaryote11 Genetics - 11 Global Biodiversity Information Facility - 10 Evolution - 10 Enzyme - 10 Chromosome - 10 World Register Of Marine Species - 10 Citeseerx - 9 Phylogenetic - 9 Model Organism - 9 Gene - 9 Virus - 9 Species - 9 Eppo Code.
Figure 4. Network visualization of Wikipedia pages relating to 'Jerzy Jurka'. Node and label font sizes represent the indegrees. Color differences represent community structure of related pages within the network. The network graph is composed of 1,020 nodes and 2,840 edges (2.8 edges per node on average). The top 25 most connected nodes in the graph are as follow: 13 Integrated Authority File - 13 Evolution - 12 Genome - 12 Genetics - 10 Transposable Element - 9 Biology - 9 Molecular Biology - 9 Population Genetics - 9 Phylogenetic - 9 Eukaryote - 9 Dna - 9 Bibliothèque Nationale De France - 8 Speciation - 8 Virtual International Authority File - 8 Worldcat Identities - 8 Oclc - 7 Gene - 7 Computational Biology - 7 Chromosome - 7 Evolutionary Biology - 7 Système Universitaire De Documentation - 7 Phenotype - 7 Wayback Machine - 7 Helitron (Biology).
From figures 1 to 4 it can be seen that the topologies of the networks differ remarkably from each other. The most complex network graph was the one related to 'Transposable Element' wikipedia page with more than 6 thousand nodes. Interestingly, Kapitonov and Jurka (scientists who discovered Helitrons) Wikipedia pages are quite different from each other, with Jurka's page being more detailed and informative, giving a more interesting network graph as result. The 'Evolution' Wikipedia page was shared among three network graphs (Helitron, Kapitonov and Jurka) top 25 most prominent nodes, suggesting the importance of this term when referencing Helitrons. Topology and aesthetic differences among network graphs were used to compose a visual artwork containing relevance to Helitron Transposons in maize and one of the scientists who contributed to their discovery and further characterization, Dr. Charles Du from Montclair University.
Re-mixing of networks graphs based on DNA multiple sequence alignment of Cornucopius family of Helitron sequences_
I previously developed a methodology to create algorithmic visual art based on multiple sequence alignment of DNA and protein data. This approach was used to re-mix network graphs shown on figure 1 to 4 according to the nucleotide sequence of three Cornucopious Helitron elements post-alignment (Figure 5). DNA sequences used were kindly provided by Dr. Du and are presented below:
Accession Number for Related Data: >AC186626.4-Contig45
, Gi Number:
Genome sequence: reverse complemented Genome Size: 74569
TCTCTACTACTACATAAG, 17016, 17033
GTTGT CG TTGC AA CGCA CG GGCACTCAC CTAGT
GTTGTCGTTGCAACGCACGGGCACTCACCTAGT
Found at 64560, 64592
Hairpin to End: 9978, 10010
GTTGTCGTTGCAACGCACGGGCACTCACCTAGT
CACGGGCACGCAACGTTGCTGTTG
110001110000000011100011
Helitron Sequence Location: 9978, 17032
TCTCTACTACTACATAAGAAGCTAATGTAGACGTTCACAAAAGCTTTTGGTGCACGGTTCTGCCGAGGACCTCCGCCATCAACGCTCGCGATCGGACCAATACGCGATATCACAGCACCGCCCGTTGCCGAGGCCGCCATCAACGCCATCCTCCCGCAATCCCCGCCCTCGCCACCTCCATTTCCTCAGAGGAAACGCCTTGCACAGCACGACTCCTCCCCATCTACCGCTACCTTTCTACCTACCCATCTAGCACCGCTCGCTCGCAAATCGCAATGGCGCTGGCCACCAACTCCGCCGCAGCCGCCGCGGCCGTGTCCGGCGTGGAGGCGACGATCCATTCCGCCCCCCGTAGGATTTTGCTCCTCCCGATCCAGACGTCAGATCTCGCATGGGCGGCACGCGGATCTGGTGGTTTCTCTAGAGGCGGCGGCGGCGGCCGGAACAGGGGCAGATGGCGCTCTCGGGGATGCGGGGGCTCTCTGTCTTCATCAGCGACCCTCCTCCGACGCTCCTCCGACGCTCGCCCCGCTGTGGGACCTCCGCTGGCGAGCCCCTCACCGCCGAGGGACGTCCGCTGGCGAGCAGACCAGGTCCATCATTCCCTATGCTCTTCTCCTTCACCATGGGTTAGGCAAGTTTCCTACCTCCTGGTATACGTTATAACCCTACCTCCGTATACAGTGCACCTGATACTGTATGGTTGCCAAAGGCCCGCAGCGGAAGCCTCCATGTCTCATGTATGCGCGCCGCCAGCCCCTGCCGATTGGAGACCACGCTGATTGGAGCCGCCGCCGCCGATCCGTGGCCGCCGACGCCGATCTGTGGTCGTCAACTCGTCCGTCGCCACGAGGTCGGTGGTCGTGCTCTGTTGTCACTCGTCAGCCCATCCACCATTGCTAGGTTTGTGGCCATGATCTGCCGTCTGCCCTGCCCTTCTTTCTTCTCAAAGTCTCAAACTCCTCCCGTTCTTCTCTGTTCTTCTCAGATCTCATGGCCCGCGGCAAGTTCTGTACTTCTGTTGGCGCCTGACTCTGACTGCTAATTGTTCCATTCTCTGTTCTTATTAAATCTATAGTAACCACCTTATTTTTCTTCTTGTTTATTTCTTTTCAGGGAAACCATTAAAGGGGTTTAACTTTCAATAGTATATAAATATTAATGGGTTGCCGGTATGATTTTGTTATCCCTTTTTACCTCCAGCTTGGGGCATGAAGCTACTTACATGACATATATTAAATATAAAAATTTGTACAACAATTTCATGCTAGGACAAGCGCAGCTCCTCCAAACTTGGATGAACAACAACGTAATCAGGTATCTTGTCTTCTCTTGTTTATCTGATATTCTGATGTATGTTATTCTTATTTGTTTGTTCAGTGCTTGTTTATTTGACTGCTAGAACTAGCAAAAATATTCTTCCAATGACACTATCCGTCCACAATATGCCTGGAAAAAGAGAGTACTTTGCTAGAAAAATCACCATAATGCAAGTGAATCATCAAACGAAACAACAGATTTTCGCAGAGCTATGTTAATACGGATTAAACCTGTTGTGCCTATGTTAATACCGTTCTAATTGTAAGACACTTGTCATTTTTCCCTACACCACGGTTGCAATGGAAATCGATGCGCTAATTTAGCCAAAGAGACAACTGTAGCATTGTGCTTACTACTCAAAAATAGAGACTGTACTATATATCTACAGTGCTGCACAAATCCTGACGAGCAATTTGCTGCTGCAAATTTACGTCATAATGCTCTTTAGATTAAGCAATTCTGCCCAGTTCAGGAGTTCCTATGATGCGGACATGAATGCATTCAGAGACCCTTTACAGCGAAAACACCTTATTGAAGTATCTAGCAGTACAAACTACAATCGATCATCCAGGATAGCACATTTTATTTTTTCCACATGATATCTATGTGTCATAGAGATCGTTAATCTCAGAGTCTAGACTGAACACAAGTGTTACTCGGGTCAAGCTAGATGGCATATCCTGCCACTATTTAGCGTCTCTGCATCTAAGCCATCAAAGGTCACGAATCTCACTATCCCCGGAGATCATCCAATCCAAAAACACTTACCGTAAACTTTTGCTACCTTCAGTTCAGTAGTAATGTTGTCGTGTGTCTATACTGTACCGTTAGACACTTAGATAATACCTAAAGCTTGTGTGATCATCTGTCGTCTTACAATTCTTTTTTAAGTGGGCAGCTACTGTGCTGTGCCGCTCCCCAAGATCTGTTTGCGCTGTTCGTGCAAAGCCTGCCTACACCATTTTAGTCCCGCTGGTTTGTTTCTTTTACGAATCGTGCATGCAGTTCTGTATTTACGTGATGGTTCATGGCAAACGCTGCGTCATCATAATATAGCTACAGTACTTTGTAAGCTGTTCTTCAACTGTTCTATATTATGATTGTAAGATGCAAAATATGTGCACGCTTCATTGGTTTCGCGATTTTAAAATGTTGTTTTTTCCAGTGTGAGCGTCGTGACCAAGGACCGGCAGAATGGCAGGTGCGCAGTGCGCTCCATCTATGAGCAGCAACAGCATCAATAAGGGATGCGATTGGTATGTGGGCACAGGAGGGCGCACCATGAGGCATGAGCTGCCAGCAGCCTGATGATGCGTCGCGGCCAGTCCTCCGATTCCGTGTTCTTTGTTCCTGGTGGATGACTTCCCGAGTGGTGAGGTTGCCTGGGCCCTGGGCCAAGGGAAAATATTAACGCTGCGTTCGGTTTGAATGTTCCCTGAACTAGGTAATATAGTGATGCATGGTATGCTCCACAAATAGTCTACTAATCAAACACACTAAGCAAAGAAAAGCCATGGTAAACTTTTGCTACCTTCAGTTCAGTAGTAATGTTGTCGTGTGTCTATACTGTACCGTTAGACACTTAGATAATACCTAAAGCTTATGTGATCATCTGTCGTCTTACAATTAGGAACTAAAATATTTCAACATTTTGAGTGCTGCAAACAAAGAGTTAATTCTCATTGAACTCAAGAACAACCTGGGAACATATGAGTCCTTTTAAGATGTTGTTATCTGCTCATACCATGCAGTACTTTTCCTCAACAAGTGGTGTTGTTTTAAATACTAGAATAGAACTACATCCATGAAAACTATTACTGCTCATATCCACGACAAAGCGCCAGTGAACGTGTGTGCGACGAGGAGGGACCGCCTGCTGTTCTGAGTCCAACCAACTAGACCTACAAGTGAAAGGTATGCTATATATAATATGGTGCTATTTTGTCGTGATCCAGATTGTTATTGTAGGCCATACACAAAACGGGGACCGATAAAATAAAACCACAATTACTTCTGATTCTGAATTGGAAGCATTACATAGCTTCGAAGGTATACACTTCCCTTTTGATTCTTTGGGCAGTTCAACCACATTAGCTACTCCATCCACACTAGCTACTCCATCCTCAGATTCTAGAACTGCTCATCATTATCTTCATCCAGGTCAACACCTTTGTCATCTCTAATTGTTTTAGCGTCTTGATGCCATTGTACTCGTGGACAGGGGCTAGCTGTCTTTGTGTCATTGTACTTGTGGACATGGGCTAACTCATGTAAGTTTCAGACTCTCAGGTATTCTCTATTGCAACATTTTCAGGCTAGAGCACAGTGTTGTTACCTAATCCTTTGTTTTATTTTTTTACCTTCAAGCTCTGGAATTACATTCTCCTTAAGCATAAAAGGAATGCCACTATAATTCTGGAAACATTAGCACATAATCCTATCTTTTAAAGGGTTATACCTGGTTCTTCCTGTGCAATCTTACCAATTTTTGCATGCAATCTTCCTGGTTATACATGATAATTATGAAGAGGGACGTATGGTAGCGGGTTCCAGAGGCAGCAGACCGCTTGGTGCCTAGCTGGGGATGCAGGCGGACGTGGTGGGAGCGAGCAGCGCCGTGGTTGGTAGGACTGGCGATAGATGACAGGGAAGAACTAGCATGTTTTTTTTATATATGTCCATATCAATATTTTATACTAATTTTTAACTCTCACGGCAACGCAAGTATACAATGCTGCTCATCTAGGCGCCACTCACCATGCTTTGCCCTTATACTATGACTTATTAGACTGATTTGTTATTCCTGCACACAAATCTCAGGAAACCAAATGATTGAAAGACTTCAACATATATTTGCTTCATCAAGTCACACATATGGTGGTTTGACTCTAGATTTTCATTGTAATCGACTGGGTCCAACTACTTTATTTCAAGTTTTTACAATCCTAGAAACTTATCTATCAAAGTGAAATAATATGATACTCCTATCATCTCATCGGGATTTTGGCTTAAATTCTCCATTCAACTTCTGCACATTAGTAACATGCATCTTGGATGTTATAGTAATGAGAGGAGGTAGATGATATAATTTGCCTGTTCCTGTTCCAGTATAAATTCAATTCTGGTTTCTTTGGATGCTCATTTCCAGTTCAATCCTGCCCTTAGCATGGGGATGTGGTGAGGTGGTACATCCAGTCCGCGGAGCCCAGGATGAGGTGGAGCGCCGACCTCCACTGCAGCTCCGTGCAGGCCATAGACTGCCTCGGTGGCCAACACAGCACGTTCTCCTTCCTCTGCCTCTTCCCTTCATCATAAAGCCTCCCATGGTGTAATCTTGCCCTCGGTTTTTGTATGCTCATGTAGTAACATTGCCATTGAGTGATTGAGTAGAATTATTCAGTGATTTGTTCCCTATCGGTTCTAATATTGATTCCATTCTTTCTTTCGAAGAAACTATTGGTTGTGTTGTGTGCTTGTGTGTATTTCTCCTACTTCCAATCACGTTTTCAACAGCCTAGCTAACTTTTGTTTTGGCGGTGTAATAAATCGCAGAGGCTACACCAAAGCTCATTCTTCAGTTCATGGGCCATGGGCGCCAGGGGGCTCACCATATCTCATGTCAAGAGTCATCTCCAGGTTAGTTGTTTCACTTTTTCATCCATTGCCTGGCATGCACCTGTAGCTTCTCTGCCACTCATGTCTTCACAAAATCTTGTAAGCATGGCTGCCAATGTTATCTTAGCCTTTTTCTAAAAGAATTATTTTGCTTACGATTGTTATGTTTGTCCACTGTTGTCGTAACCAGTGTAGGTCCTCTAGATTTTAGCAAACCTTGAATCATTCTGCAATTCTTTTTATGGTGAGTTAACAAAAAATAATTGTTATACTGTGACGATAAAGGCGAAACAACTTTTTAGGTTCTTTTTTGTAGAATATAACAAGCTGATTTAATCTTGCAAGTTGTTCAATGTAGACCTCATACTTATCAGTTTCTTTCTTGTTGTGGGATATCTCAGATTTTATCTCAATATTGACTTTAAGTATCGTAAGCCTTTGTCAGTTTGTGGTTGTTCATTGTTGTGTCAATCTAAGAATATTGATTGCAAATTCTGTTGTTTGGACAATGAAATTTACAATATTTAGAATATATGTTAGACACTGAATACTGTAATGTTGTCTTTGAGTTGGTGGACATTGATAAAAAGGAGATGGAGATGATCAAGGACCTGCCTGAAGAACTCAAGCAGGACATCAAGCGCTACCTCTGCCTCGAGCTGGTTAAGTAGGTACAATCATTAAAGTCACTTGGGATCTTGCCTAACTTTTTTACATATGAATGTGCGGTGAGATTGTTACAATATTTTCCATTGCCAGGTCTCGCTGTTTCATGGCATGGACGACCTGATCCTGGACAACATTCATTATAGTTCTGAATGAGATATGGATTGTTCTGTTTAGGTTATGATCCTGCTATGGGACTGCTTCAGTTATGACCCAAATTCTAAGTTGTGACTATCATTGTGTGCATTTGTTTTCTTTTACTTTAGAAAAGTAGCTCATGTTCTATGTCAGCTTCTCTCAAAATCCGAAACACGTTTTTGAGGTTGGCGTAGTTGTAGTTTGGCGTTCTGATGTTTATCAATTTTGTTTTATATTTTTGCATCCAAAATTCACTTGTCTTTCTTCATAGTCTGTTTCAAAGATGCCATGAATGTTTTAGACAATTGGTTTACAGTTGACAGACTTCTTCACTTAGCTTGTGATCTACAACTGATGTACAAGCTGACATATGTAGTTCATTTGAGTGGAACTGCAGCCGCATCAAGATTTCTTTCATTACTTCTACAACCCATTCTAAAGAGATTTTTGTTATATAGGATTCGACCGAGGATAAGCATTGGTCATATCCATGATTTTGACCACTAATATTGTTGCTTATAGTTGACAGAAGCTCTCACTTCTTCACTTCTAGATGGAGTCCTTTTGATACTTGGGGTGTTCGGTAAGTTGGTGTTCAATTGGAGTATGTTGCATAATTTCATAGCTTTCTTAATATGCCAATTCTGTTGATGGAGCTTTACCGACATGCCAATTTATAAATGGAGCATGTTGCAGTTAGATGTAGTCTATTTCAGAATTGAGTGTACAAATAGATTTTACTTATAATGTGTTGCATAATAGGACTAAGCTTTAGGGGAGTGCTTTTGTACCAATGGTAGTAATTGGTTAGTATCTTATGATCTTCATGAGAAATATGACATCGTTATATGACTGTATTTGGTAGCACCTTATGCAATTTTTTTGAATTGGCCAAGTAGTGTGGTTTCGTGCCCATAATAGAATAGTGACACTTAGTTGATCTTTTGTTATTCTTTTTCAGATGTGAAGACCAAGTAGGAGACAACTCATGGGCATAAGCATATTTTCTAGAAGAGGAGAGTAGCACTTGATGACTTTGATAGGTTCAAAGTCATGCTATCAAATGTTTAGTTCGTGTTGGCACTGTTTCTATTGCTCGCTCACACTTTTTTCTTTATGTAAACAGAGGGTTGGTGCTATCAGGTAAGAGCTCGCCAAGTTGAAGAAGGCATCCATGGCTTAATCGAGATTTATTGTTTGTATATATCTTATCATAACATTTTTACTTCGTAGCAACACATGAACATTCACCTATTTGTATATAAGTTATCATGATATTTATAAGTTGTCGTTGCAACGCACGGGCACTCACCTAGT
5' end
TCTCTACTACTACATAAGAA
AT content is: 0.7
CG content is: 0.3
3' end
AAGTTGTCGTTGCAACGCACGGGCACTCACCTAGT
AT content is: 0.457142857142857
CG content is: 0.542857142857143
Accession Number for Related Data: >AC191691.3-Contig129
, Gi Number:
Genome sequence: reverse complemented Genome Size: 24998
TCTATACTACTTATTAAG, 16548, 16565
TCTCTACTACTTATTAAG, 12973, 12990
GTTTC CG TTGC AA CGCA CG GGCACTGAC CTAGT
GTTTCCGTTGCAACGCACGGGCACTGACCTAGT
Found at 18511, 18543
Hairpin to End: 6456, 6488
GTTTCCGTTGCAACGCACGGGCACTGACCTAGT
CACGGGCACGCAACGTTGCCTTTG
110011110000000011110011
Helitron Sequence Location: 6456, 12989
TCTCTACTACTTATTAAGGCAACAAGGGTAGCCTACCTCCCTAGGTTCTGTCGTTCTGCCTCTCCTTGTTATGTCTATTCCGGACTCTGACTGGTGGGCCTCCCATCTCTATATCCCTGCACATCCTTGTGGCCCACCATGTCCAGGGCATTTAACAAAAAATGGGATGTGTGGAGAGAGTGACAAGACGATAGTAGCAGCGGAGCATATAAGCCGGTGGTAGCATCGTTCGATGGTCGGTCTGGTACAAATCCTCAGTTCATAAACCGTCAAGACGAAAGCCATAGGCATTCTGGACTGGGTGCTTTGGTTCGGCGCAGATTTAATGGAAAACAACAGATCAGACGTCCTGGGCTAGATGTTTTGGTGTGGCCCAGATTTCATGAGAAATAGGTGTGTCTGCTCATCATCTTTGTGTAGGCAGCCTTCGTTCTCCATGGATGCTCACGACACAATCAGAGCAACCACTCTCCCTTTCCACGACCATGTCCTCCTCTTCTCCCCCTCAATCACGAGTTCATTATCGTCCACAACCTCCCCTCCTCAAGGACAACTGACTTTGTCGACACCCCTTCTACAGGCTCCTCAATCCCCACGCCATCATTGACCCCAACTCCAAGCTCGAGTTTGATATGCCCGATATCTTGCGTGTGTACAAGATCGACTGTGTCGAGTGCTTCGACGGCACTGAGATCGTCTTGTCGTCCCCCCAATGGCAATTCCACCAACGACGTCATGTCCAAGGACTTCACTACAGGCATCTCCTCCCACCTCTACCTACCCGCGGGGGTGGATCCTGAGAAGAAGCTCCGCATCGTTGTGTTCTTCCACGATGGTGCATTCATGGTCCACAACGCCTCCTTCCCGTTGTACCACATTTACGTCGCCTTCGTCGATGCTGCTGTGCCCACCAGCCGCTGATCGTGCTCCCTCGCTCGCGATGCTCCCCCCTGACTGGCCTCCCCCTTTGTTTATTGAACGATTCTTGTATATTATTAGGTGAGGATTGGTAAATTCATTGACTGTTCTTGCCCCCCTCTCTTAGGTTCGGGTGTTGTTGTTGCGTGCACAGGTTCAATTGTCTACACTTGTACAACAATTTAGTGAGGTTGATACTAACTGTATGCTCGCTTTGCTTTCTTTTGTGTCGCAGTTGTGTACAAGATCGATTATGCTTAATATGAAGAACTAAGGAATAGAATTCAGAGATAAAGGTTGTAATTGTTTTTGCATGTTTCATCTTCTGTATATGAATTTTGCTTGTTACATTATGTGTGTAATTGGTGTCGTTTCTGATTATTGATTGTGCCTCTTACCAGTTCGATATGTTGTCTACACTGCAACTTATCCCAACAATAAGAGTGAAAATGTGCCAATTTTATAATCTTAAATAGACATGAGACCAAGGTGAGATGTTGTTTTTTAAAAACTCACCCTCACTATTTGTTTCCACATGAGCAATTTGTATTTTAGTAGCATTTTTGCATCACACTTTATTACTATGTCTTGTGTAAGGCTCTAATTGTTGTTAAGTGTTCGTGTATGCCCAGGTGATTTTTTGTTATCTTCCTGTATGTGCAGGTTTAGTGTTCAAAATGGCAGCCTTAGGCAGGCTTGGTGGTCTTCTGAGTTAGGTTTAGGGCCACTAGTAGCTTGTCTCCTTCAGTGTTCAATGCTCCTCATGTCCACCAGGCTATTCATTGGTGGTGAGTTTTTGTTGTCTTGTTTCCTCTAATATACCAATAGTCTTCATTTATGTTTATTAGTGTTCTTTGGTTAGGTCTTGACGTGAAACTAAGACAAGCATTCAGTTAGTTTGGAGAGGTTACTAAAGATTTGTATACACATTTGAATGCTATTATGATGTGTGTTAGTGGCTCATTGAATGGTATTGATCCAATTTCCTCTACTGTTGTGTAATGGTAGATTACGAACAGCATAACAACTTCGATCATTGGTCTTTTGGTTACTGATATAGTAGTGATGTACCTTTTAGACTAGAGGAGAAGTATGTGGTGCAAAATGTCAATTCTCCAAAAAAACAGTAACAGTAATAGAGCCTTCTTTCAGAAAATCTAGACATCATCTAAAAAATATATAGCCAAATGAGAAGGCAAAATGCCAAAAAACAAGAAGGAAATGTTTTTAGTGATATTTCAGCATCCAGCATTATATACTTCTATGTTCTCTTCCATCCACAACATTATTGAATTGTTTCTGTTATATCTTACTTTGACTAGAAGTAGGATTTCCTGATATTTTTCCTTTTCCTGCAGAACAAACAAGATACTTGGTTTTTCTGTATCTCTCATCCTCATCAATCTGGCTTCAATTATGGAGCGTGCTGATGAGAATCTCCTTCCAACAGTTTATAAGCAAGTCAGTGCAGCCTTCAATGCTGGTCCTACTGATCTAGGATATTCACCTTTGTAATGAACTTTCTGAAGTCAATAGCATCTCCCTTAGTAGGTATCCTTGCTCTGCACTATGATCGACCAACGGTGCTTGCAACAGGGATTGTTTTGACTGTTAGAGTTAGCCAGTATTTTGGGCATGTTACATTCAAGAGAGCAGTAAATGGCCTTGGGCTTGCCATTGTAATACCTGGTCTTCAGTCGTTCATTGCTGGAACTGTTATTACTGGGTTTGACACATGTTAGATTTTTTTATTACATGTGGTCGAATATGCGATACAACTAAATTACTGCATGTTATTTCACTTTTATTGTTTGAAGTATGAATTGAATGGACTGTTCGTTTATCAAATGTAATATGTTAGTGTTGCCTCCGAAACTATAACATGTTTTCGCTTTTTGGGATATATTGCTTTTATTATGTATTTACACATACTGTATATCTAAGTGCATAGCAAAGACTATGTATGTATCTAGAGAATGGAAAATGTTTTTTATAATTTAAAATGAGGGAGTATTTAACAAGGACCAATATGTCCTAAATTTTAGACTTGCCTGGGTCATTATTTTGGTTAATACATGGGCCTCATGGTGCAAGATATGGAGCTCTAACTCAGCATACAACCCACTTTTTCCTTGGATGTAACTTTTGGGTGATTTGTTTGTAAATGTCGTAAGTGAATAACTAATCACAGGAGCATAAGGTTATTCTATTTTTTTGTGTTCCTTGCTATTTACTACTCATTGTACCTATGTTGTGATGAAATGTATTCTTGCTTGCAAAGTGGACCCTTGAAGACAACAGTTACTGGTGTAGAGATGTTCAAAAAATACTGGGCCATGGAGAGGTAATGATTTTGAAAGTAGTATGTGCTAGGTTTTGTATTGTTCCATATTATCTTACATGCAACTTCATATGTGCAGGCTGGTGACAATGTTGGTCTTCTTCTTCGTGGTCTTAAGCTTGGCTAGGTTACCATACTTGTCAATCCCATTCATTGGTAAGCATCTTGAGTCTTTTGGAGCTTCGTTCTTACTATGGGTCATGCAATATTCTGGAATCAATTGCTGGCTAAAAATTTGAAAGGAAAGTTGATAGTGGAGAACCATCATTGTAATTTACTGAATGAATAATGCAAGGAAACCCATCACTCAAAAATAACCATGCAGTTGACGAGCACAACAAGTGCGAGGTTAGTGGCCTCCCTTGCCTTCTGGTCTACTCATCCAGTTCGGTGCTTCTAGTCTCCTCGCCCTATGTGATAGGTCCCACGCTAGCCTCCACCTCGGCGGCATGGTCGTGGCAACAGGAGCATCGCCATAGGCAATGACACCGCGCTCCATGTGCCCCCACGGTCGTGCCTTAACAGACTACATTAAGTAACATGAGCAACTTAATAGACTACTATCCTAAATTGAATCAAAAACACCAGTGGTATAATATCTTGTCATGATTAGTTTTGTCAAATAGTCCTCAAATACGTATCTTGAAGATATTTGTCTGCTCATTGTTTTGGGTCATATATGATGTTTTTAAGTGATGTATGCACAAACTCAGTTATGTTTTCTAATTCGCTCATACTGTGAAACTATGCTTTTATAACTGCATAAATCCTAATCCTCATATATGTAGTCTGCCGTATGGTGGCCCTGTGAATGTTGTGACAATGTTGTTGAGTATGCTGGCCCTATGACTCCTTTGACCATGCTGGTGAGCTTGATGGCCCTATGAATCCTGTGAATCTTATAGACGATGGCCAGGTTTTTTGTGAACTCACTCATTTAGTGCAGTTTGCACTGTGATGCATAGAGCCAAGGGCCTTGCTTTGCATAGACACTGAGTTTGTGATTTTATAACTGCACAAATCCTATCCTCATGTATGCAGTCTGCCCTATGATGGCCTTGTGATGCTTGGAGATGATGAACCTGTTCTGTGAACTCGCTTATTTTGTGCGGTCTGTCCTGTGATGCATGGAGTTGAGGGCCCTACTTTTTTACAGACATTAAGTTTGTGCAGTAACACATGTAATGCTTATACATGCTAGCATTGTTTGGTTCATGCACTAATTTTGTGCAGACTTCCTTGTGATGCATATAGCTGAGGGTTCTGCTTTTGTACAGGCACTGAGTTTCTGCAGTAAGACTTGTTGTGCTTATACATGTCAGCCTTCTCCAGTTTGGACAAAGAAGTTGTATGTTGGCGTCCTTGTGATGCTTATAGATAATGCACCTATTTTATTCAGGCTGATTTGGTAATTATGTCCTGTGATGCATTTGGGGAGTTCCTTTACCTGACATAAAGATGTATGTTGAGGGCTCTTTTCTTATGAACTTAGTGAGGTTATGTGTTTTTATTATTCTTTCATCAAAGAATGTACTGATTGTGTTTCTATTTGTTAACAGATGAAGAAAAAAGCTCCCAGTTTGGACAAAGCTCTTATGCTGTGAATCACTTGTGGGAGTTATGCGCAGAGAAGTACATAGGTAGTGCTACTGCACAGTGTAATTGTTATTGCTCTGAAATGATGTAACTATTATAGGGAACTGATGCTGATCCAATTATACAGTTCCTAGAGTTGGAAAAGTAGGTAACACTTTTTGTAAAGAGCATTACATAGAACAGATTCAAAAGCTATCTGAGAAGGTAGGTATTATGCGCTTCTGGTTGCTGATTCAATTTATGGACTCTCCACATTAGAGCGAGAGTGCAGCACAACTTTCATTGTCTGCTTATTCAGATATTAGTCCTTAAGTAGATATTATTTAGATAAAAAGAAGGCATTGCAGTGTCAGGACTGATAAAATAGGGGAAAACAAAAAGATACATCACTGTCAAATCCTACAACATAACTTCTAGGAAATACTCCTTAGGAATAGGAAGAGAAATCTAAGGTTTGTACTTCTCCTAAAAATTTAGGGCTATTAACCAGCCTTTGGTTCATGCCTTCTGAACAAGCATTTTCTTTTGTTTGTTTATAGATATATGAGCCCATTACATGACCATGTTTCTGTGATGGGACAACACGGGAGTGTCTTCGTTGGACACATATTTTTAGCAGCTTTTGTTGCTCGATCTTCAAGAAACATAGGTACTATCTCTATTTTTTACTTGGCGCTCAATTGTTCAAGTGTATACTAACAAATGTCAAATAAAAAATTGGAGGGAGTACTTTTCATGGAACCTGAAACCATGGAGAGCTAGAGGTTATATTGTTTTTTCTGTATATCAAATAACCTACTCCTACCGTGCTACTTGATTAATTCATCATCTTTGGCCAATTATTTATGAGTTGTCACATGGGTGTTGTTGATAATCATGTTTTTGGAAACATCTACTTATCTCACTTTTCCCTTAAGTGCTTTGGACAACTATATGTTGGATTCTGTATAGAAGATCCTCTACAACAACAATTATGTATTGTTTGTGTTGAAATTCACCCCTGCCAATCTTTTATGATTTATAAATGTTGTGTTTACTATTGTTACCCCTGCCATTATGATGTTCCAATGTTCTAAAAAAGGTTGACTTTTCACTTGCACTAATTGTTCTTTTTGAATTTGGATCTTTGCTAGCAAGATTCTTTTATCATATTGATTTTGTAGCAACATACGTGCACCTTACTATATCTTTTAAGGTATATAAACATCCACTTTTAGTGATGCTAACAAAAGTATAGAATAACAATTAGAGTTTTTGCATATCAATGTATTTTATCATATTGATTTCGTAGCAACGCACGTGCATATACCTGCCTCTCCTTCCCGTCGCCGCAACACAAAGCTCCGTCGGACACCGACCCGACGCTGCTCTACTGCCTGCTACGTTGTCCCTCCGCTGGCTCCCACGTACACCTTCTCGACCGCGCGGCTACTCCACCACTATCCCTCGAAGATTGCCCCCTCCCCAAGACTGCCAACCTCCACCTCACGTCGCGTGCCTCGCAGACGGGCGACCTCGCACCGCAGGCTTGGCGCTCGACCGTGGCCCGAGTTTCGCTTCTACGTCATCCGGAAAGCTCGAGGAGGTCATCCTCCATCTCATAACAGTATAATACACATTTGCTTATAAGTTATAGTGATATTATATGTTTCCGTTGCAACGCACGGGCACTGACCTAGT
5' end
TCTCTACTACTTATTAAGGC
AT content is: 0.65
CG content is: 0.35
3' end
ATGTTTCCGTTGCAACGCACGGGCACTGACCTAGT
AT content is: 0.457142857142857
CG content is: 0.542857142857143
Accession Number for Related Data: >AC191632.3-Contig9
, Gi Number:
Genome sequence: reverse complemented Genome Size: 38160
TCTCTACTAACTATTAAG, 37165, 37182
ACTTT CG TGGC AA CGCA CG GGCACACGG CTAGT
ACTTTCGTGGCAACGCACGGGCACACGGCTAGT
Found at 3725, 3757
Hairpin to End: 34404, 34436
ACTTTCGTGGCAACGCACGGGCACACGGCTAGT
CACGGGCACGCAACGGTGCTTTCA
000001111000000111100000
Helitron Sequence Location: 34404, 37181 Silico_101
TCTCTACTAACTATTAAGAGCTTATTGTAGACTGCCCCCGCCTCCCTACCCCGCCGCGACCGCTCCGCATGCCCGCCGCGAAGCGCTCGCATGCCCGCCTTGACCGCTCTGCACGCCCGCCGCGAAGCGCTCGCATGCCCGCCTTGACCGCTCCGCACGCCCGCAAAGCCCGCTGCGACCGCTCTGCCGCGAACCGCACGGCCACATCGGGTGCCGCACCACAACGGATCCTTCTTCAGCGGGATCCACCATTTAGGCCTCTGATTCGGCGCGGATTCACGGCGCGGAGCCGCCGAGGTTGCGTGCGTGATTCGGATTGCGGAGCCGTCCGATGGAGGCGCGAGCGTGCGGTCGAGGCAGCCCACGAGCCGCGCGTTGGCGTCGTCGAGCGCCTCGAGGTCCAGCCACAGCGCGGCGGCCTTGACGTCGAGCGCGGCCTGTAGGCGGTTCACCACGGCGTCGAACGACCCGCAGCAGTGCCTGGTTCTCGCGCACTTGGACCTCGAGGTGCGCCGTGAGGGCCCAACCACCGTCCGGCACGGGCGGGCCCACTGGCCCGCTCCTCGCGATCCGCTTGAGCTCCGACAGCCTCCGGAGGTGCGACACAGCGGCGGCGTCCGCATCCGGCAGGAACGTCGTGTGGGCGGCCTGCAGGTGGAGGTACGCCGCCTGGAAAGACGAGGTCGTCGCAAGCGCCGCGGCGACGGCGGCATCCTGCCCGGACGCCGCCTTCCGATCCCCTCCCCATCAGCGCTAGGGTTAGGGTTCGGCGGGTCGGGCTTCAGCACGACAACCCGCTGCCAGGCAATTACCCCATCCGCCCCCGGCGCCTGAGAGCGTGCTAACCGGTCGGCGTCCTCGTCTTCGTCCTCCTCGGCGGGGAACTCGATGGTTCGTGTCTTTCCTTTGAGTGGCCAGAGGCATTGCTGCCAGAGCCTATAAAACCATTATGCGATGAACATGATCCACCTTGTGCAAATCCTGCCCGTGCTTCAGTATCCTTATTCCTCCTGTTGTTTCCCTTGAATCACAGTACAGCATGATCAATACCAACCTATAGAAAGATAAAGGAGCTCACAAAGATATACTTATTGGATGAGGATCTGACGCTTGTCCAGGCACCTCCTTTATATGATCCATTCATAGTAACTGAAGGCACTGAGTCAGATGAACAAGACGATGAGCTATCATCCACAGTAGAGGTCCTCTTTCCTGCATGGTCATTTTGCACCTCACTACTTCCAGGAATAGTAGTTTGAGTTTCTGAAGCATCTATCTCCCAATTAACAGGGCTAGATTCACGGTCTTCAATATCAACATGAAGCACATCTGAATTATCGTCTCTACTGTCAGATATATCAGAAACTTCTTCAGGGTTGTCAGCATTTAAGGGCATCTCTTCAATTTGTCCACAAAAATCATCCAATATTCTATCATCTGAATGACTGTGGTCCATAATTATTTCCTTATTGATGTAAATGTGGCAACACTTGTCAGTTTCATGCTTTTACTAATGGAGGTCCTGTACTGTTCAAATTTAGAGAAATATTACATTTATGTTTTTGAGCACATATTGCTCCAGCTTAGATGTCTGCCAACACAAGTATTGACGTGCGGATGTTTACTTGTTTAGATCATTGAAGGATCTGGCAAGGTTGCTATTAGAGGGTGATTAATTATCTTACATTATTCCAGTTCCTTAGGAGCACTTAAGTTTTTAATAACTAATCAGGTATTCACAAGGTATGGACTCCAAAAGTTGTTTGGCCTAGACTGCTTGGCACTCACCTCAGACCAAACCCTAATTTACCCTGATCTTCCATCTAAATATGGTATGCATAATCGTTTTTAACAAGTTTGAGTCAAGGAATTTAAGAGAAACCAGTGTGGATTTTATATGTACTAACTCATGTTGTCAGTGATGATAATTTTGGTGTCTTTCTTTTGGATTCTGAGCATTGTGTTGACTGTAGTAAAGATGCATAGTCTTGCTGGGCTGTTCGGTAGCACCTCCAGGTGTTTGGTTATGCTGGTACCTGGCAACACTCAATGGCCATGGAATTGGCAAATAGAGATCCTTGAAATGGTTGCCCACTAGGACCCTGGTAGCCAATGTTCTTGCTGCAAGCATAATGGCAGTTCTTGCTATAACATCCAAAGTGGTATGAGCGAACATTTTTTCATACAGATTCATTGTTTGCGGTTGGTATATTGACTCTTCCATACATCAGCATTTACACCTGATTTATTTAATTTTTCTTACAGGTAGATACAAAGCGATCGACAACTATTCGTAGTGGAATACAACCGACTTCCTTGGTTGCTTGAGCACAATGTCTACTTTTGATGCTGAAGTATATACAATGAGAAGAAGTGGGCAGATTGCCAGGGCCTTCATTTATGTTGCATCCACCTTCCTGCTTTTTTTCTGCTAGCACAACTCTCTTCATGGTAGGTCTGTAATGGATAGTAAATCAAAGTATTTATAGGCTACCATGGTTCAAAGTATTTATAGGCTACCATGGTCCATGGGCTAGAATAATCAGGATTAGAGATAGAAACTTATTACCAAATAGCCAATTGGTTGTAATAGTGATTCAATCAATGTATCTCGCTTTCAGGTGCCCATCGAGGATGCTACAGGCTCATCGACATCTCGGACAACACACTAACATTTTGCCACACAGATTTTTATATATTATAGTGATATTTGTTGTTTTATATTAATGTTTTATACTAATTTTTAACTTTCGTGGCAACGCACGGGCACACGGCTAGT
5' end
TCTCTACTAACTATTAAGAG
CG content is: 0.3
3' end
TAACTTTCGTGGCAACGCACGGGCACACGGCTAGT
AT content is: 0.428571428571429
CG content is: 0.571428571428571
Figure 5. Multiple sequence alignment of three Helitron sequences from the maize genome. The first 1,020 nucleotides post-alignment are shown in the figure.
In order to construct a new image from the remixing of the four network graphs shown on figure 1 to 4, the first 1,024 nucleotides from Helitron sequence AC191691.3-Contig129 were used, as the image size from the network graphs created using Gephi comprised a dimension of 1,024 x 1,024 pixels. Each network graph was assigned to each nucleotide from the Helitron sequence post-alignment as follow:
'Vladimir Kapitonov' > nucleotide 'A'
'Jerzy Jurka' > nucleotide 'G'
'Helitron' > nucleotide 'C'
'Transposon' > nucleotide 'T'
The resulting image is shown on Figure 6, clearly depicting the re-mixing/collaging nature of the new visual creation from the original network graphs. This collaging being dictated by the Helitron nucleotide sequence post-alignment that was used.
Figure 6. Resulting image from remixing network graphs according to the Helitron nucleotide sequence AC191691.3-Contig129 post-alignment (first 1,024 nucleotides).
Subsequent arrangements of figure 6, together with figures 1 to 4 were pursued by removing their white backgrounds using the Python Image Library 'Pillow'. Additional visual elements were created using Processing and added in layers to compose the final artwork. It was important to include the image of Dr. Charles Du, the scientist involved in the identification and discovery of Cornucopius Helitron family in the maize genome, portraying his image into the final work. As the ultimate background for the painting, the network graph shown on figure 2 was used to 'paint' by moving the image along the X,Y position of the mouse. The resulting artwork is shown on Figure 7.
Figure 7. 'Helitron Art' created by Martin Calvino based on complex network graphs of Wikipedia pages and their re-mixing determined by maize Helitron DNA sequences post-alignment.
Conclusion_
The resulting artwork shown on Figure 7 clearly exemplifies the broad possibilities of co-opting algorithmic tools used in technology and science for the creation of interesting visual art. In doing this, the artist become a humanizing force in translating scientific concepts to color, form and texture for emotional impact. At the same time, the artist played a central role in portraying the scientist behind the work being explored for art creation, making him a central piece of the visual work. Interestingly, although each visual element from the painting was derived from technical/science information, its re-contextualization and re-mixing was used to a different but related purpose: art-science.
References_
Kapitonov, V and Jurka, J (2001). Rolling-circle transposons in eukaryotes. PNAS, (98) 15: 8714-8719
Feschotte, C and Pritham, EJ (2007). DNA transposons and the evolution of eukaryotic genomes. Annual Review in Genetics, 41: 331-368
Du, C., Caronna, J., He, L., Dooner, HK (2008). Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics, 9: 51
Du, C., Fefelova, N., Caronna, J., He, L., Dooner, HK (2009). The polychromatic Helitron landscape of the maize genome. PNAS, (106) 47: 19916-19920
Xiong, W., He, L., Lai, J., Dooner, HK., Du, C (2014). HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. PNAS, (111) 28: 10263-10268
Xiong, W., Dooner, HK., Du, C (2016). Rolling-circle amplification of centromeric Helitrons in plant genomes. The Plant Journal 88: 1038-1045
Xiong, W., and Du, C (2014). Mining hidden polymorphic sequence motifs from divergent plant Helitrons. Mobile Genetic Elements, (4) 5: 1-5
Zinoviev, D (2018). Complex Network Analysis in Python. The Pragmatic Programmers (Raleigh, North Carolina)
Calvino, M (2019). https://www.martincalvino.co/single-post/2019/03/23/Natural-Language-Processing-of-facebook-messages-and-their-inclusion-into-an-abstract-painting
Calvino, M (2018). https://www.martincalvino.co/single-post/2018/03/31/Procedurally-generated-artworks-based-on-multiple-sequence-alignment-of-orthologous-gene-copies
Calvino, M (2018). https://www.martincalvino.co/single-post/2018/03/09/Auditory-perception-of-reduction-in-genome-diversity-as-consequence-of-plant-domestication
Calvino, M (2017). https://www.martincalvino.co/single-post/2017/07/05/Post-polyploidy-subgenome-evolution-of-Glitch-Art
Wikipedia Module in Python_
https://pypi.org/project/wikipedia/ (Accessed during April 8-11 of 2019)
Python Image Library_
https://pillow.readthedocs.io/en/stable/ (Accessed during April 8-11 of 2019)
Processing_
NetworkX_
Gephi_