• Document: CAFE: Computational Analysis of gene Family Evolution Tutorial
  • Size: 557.77 KB
  • Uploaded: 2019-02-13 07:45:05
  • Status: Successfully converted


Some snippets from your converted document:

CAFE: Computational Analysis of gene Family Evolution Tutorial Jan 20, 2016 Contents 1 This tutorial 3 1.1 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Preparing the input 4 2.1 Downloading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Identifying gene families . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Moving all longest isoforms into a single file . . . . . . . . . . . . 5 2.2.2 All-by-all BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.3 Clustering sequences with mcl . . . . . . . . . . . . . . . . . . . . 6 2.2.4 Final parsing of mcl’s output . . . . . . . . . . . . . . . . . . . . 6 2.3 Estimating a species tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Making the species tree ultrametric . . . . . . . . . . . . . . . . . 8 3 Running CAFE 9 3.1 Estimating the birth-death parameter λ . . . . . . . . . . . . . . . . . . 9 3.1.1 Estimating a single λ for the whole tree . . . . . . . . . . . . . . . 9 3.1.2 Setting λ to a previously estimated value to deal with families with large numbers of gene copies . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Estimating multiple λ for different parts of the tree . . . . . . . . 13 3.2 Comparing models with one vs multiple λ . . . . . . . . . . . . . . . . . 14 3.3 Estimating separate birth (λ) and death (µ) parameters . . . . . . . . . . 16 3.4 Estimating an error model to account for genome assembly error . . . . . 17 2 1 This tutorial This document contains a tutorial1 that should help you get started with CAFE. The tutorial is divided into two parts: 1. Preparing an input dataset that CAFE understands: this is most of the work, and makes use of auxiliary Python scripts (which we provide) and a few other programs; 2. Running CAFE: performing basic evolutionary inferences about gene family evolution. 1.1 Dependencies The tutorial assumes you are running a Unix-based operating system. It also assumes you have a local working version of CAFE (please see CAFE’s manual for instructions on how to install it), but also of a few other programs that are necessary for the first part of the tutorial: • Python 2.7.x; • BLAST; • mcl; • r8s. 1.2 Commands Copying and pasting commands from this .pdf document can add whitespaces and quotes that might lead to errors. So please feel free to copy and paste from the file we provide, all_cafe_commands.txt. 1 If you have any comments or suggestions, please email fkmendes@indiana.edu. 3 2 Preparing the input 2.1 Downloading the data This tutorial and the scripts we provide assume you will use sequences in FASTA format (.fa) downloaded from Ensembl using the Biomart tool. To download the protein sequences from, say, cat (Felis catus), you must navigate Biomart: 1. CHOOSE DATABASE → Ensembl Genes 87 → CHOOSE DATASET → Cat genes; 2. Then click Attributes → Sequences: Peptide → Header information: Gene ID + CDS length (uncheck Transcript ID); 3. Finally, click Results. If you have a good internet connection, choose Compressed file (.gz), otherwise choose Compressed web file and provide your email address. We are going to analyze data from 12 species: mouse, rat, cow, horse, cat, marmoset, macaque, gibbon, baboon, orangutan, chimpanzee, and human. It should not take too long to download the 12 files, but we are providing you with a tarball (twelve_spp_ proteins.tar.gz) containing all of them. In order to decompress twelve_spp_proteins.tar.gz (it will create its own folder), open your shell, move to the folder into which you downloaded the file, and enter: $ tar - zxvf t w el v e _s p p _p r o te in s . tar . gz $ for i in ‘ ls -1 t w el v e _s p p _p r o t ei n s /*. tar . gz ‘; do tar - zxvf $i -C t w el v e _s p p _p r o te i n s /; done 2.2 Identifying gene families Identifying gene families within and among species require

Recently converted files (publicly available):