您的当前位置:首页正文

Genome Rearrangement Sorting Signed Permutations by Reversals (Project Report)

2024-07-11 来源:好走旅游网
GenomeRearrangement

SortingSignedPermutationsbyReversals

(ProjectReport)

CS290IBioinformaticsSpring’03

AnkurJainHodaMokhtar

ComputerScienceDept.

UniversityofCaliforniaSantaBarbara

ankurj,hmokhtar@cs.ucsb.edu

June8,2003

Abstract

Amodeltostudygenomeevolutioncanbeconstructedbyrepresentinggenomesaspermutationsofgenes,andcomputetheminimumnumberofrearrangementsneededtoconvertonegenometotheother.Underthismodel,theshorterthedistance,thelesseristheevolutionarydistancebetweenthetwospecies.However,everystudyofgenomerearrangementsinvolvessolvingthecombinatorialproblemreferredtoassortingbyreversalsthatfindstheseriesofrequiredrearrangements.Boththeproblemsofcomputingthereversaldistanceandfindingtheoptimalsequenceofsortingreversalshavebeenextensivelystudiedduringthelastdecade.Inthis

.Ourapproachintegratespreviouspaperwepresentapolynomialtimealgorithmforsortingbyreversals

workpresentedin[Ber01]togetherwithagreedybreakpointmethod[Sha95].Themaincontributionofthepaperisobtainingnearoptimalsolutionsusingthesimpleapproachproposed.AbasicfeatureoftheprposedapproachisavoidingtheexpensivestepsintroducedinHannenhalliandPevzneralgorithm[HP95].

1Introduction

Themotivationforstudyinggenomerearrangementarisesfromthefastprogressofthehumangenomeproject,andthefactthatgeneticandDNAdataonmanymodelorganismsareaccumulatingrapidly.Analysisofgenomerearrangementsinmolecularbiologystartedinthelate1930s,whenDobzhanskyandSturtevantpublishedamilestonepaperpresentingarearrangementscenariowith17inversionsbetweenthespeciesofDrosophila[HP95].Inthelate1980’sJeffreyPalmerdemonstratedthatdifferentspeciesmayhaveessentiallythesamegenes,butthegeneordermaydifferbetweenspecies[HRR97].Thisdiscoveryandmanyotherstudiesinthelastdecadeprovedthatthereversaldistancebetweentwogenomesofdifferentspeciesfaithfullyreflectstheevolutionarydistancebetweenthem.

MathematicalanalysisofgenomerearrangementproblemswasinitiatedbySankoff[SCA90,KS95].In1997,Caprara[Cap97]establishedthattheproblemofsortingunsignedpermutationsisNP-hard,usingsomeofthecombinatorialtoolsdevelopedbyBafnaandPevzner[BP96].In1995,HannenhalliandPevzner

fortheproblemofsortingasignedpermutation[HP95]proposedthefirstpolynomialtimealgorithm

byreversals.In1996,BermanandHannenhalli[BH96]describedafasterimplementationthatfindsa

timeusingUnion-Finddatastructure,whereistheinverseminimumsequenceofreversalsin

algorithmforofAckerman’sfunction.Laterin1997,Kaplan,ShamirandTarjan[HRR97]gavean

sortingasignedpermutationofelements,therebyimprovinguponthepreviousbest-knownbound.In

1

2001,Bergeron[Ber01,BS01]presentedabit-vectorimplementationthatcomputestheoptimalsequenceofreversalstosortasignedpermutationusingoriented-sorttechnique.

Sortingbyreversalscanbeformulatedas:Givenapermutationfindtheminimumnumberofreversalstoobtaintheidentitypermutation.Intheremainderofthispaperweusethetermsdefinedbelow.Asignedpermutationisapermutationofintegersbetweenand,

Permutationelementsareprovidedwithaplusorminussigndenotingthegeneorientationalongthechro-mosome.

isanoperationthatreversesablockofconsecutiveelementsstartingfrompositionAreversal

topositionin,togetherwithchangingthesignsoftheelementsbetween

(1)

istheminimumnumberofreversalsthattransformtotheidentitypermuta-Areversaldistancetion.Clearly,sortingbyreversalsisbasicallyfindingtherequiredsequenceofreversalstoobtainthereversaldistance.Areversaliscalledsafeifitreducesreversaldistance(i.e.issafeif).

bedefinedas-=.Extendthepermutationbyaddingand.WeLet

callapairofelementsforofisanadjacencyif.Otherwise,wecallitabreakpoint.

suchthatarebreakpointsandnobreakpointliesAstripofisaninterval

betweenthem.Inotherwords,astripisamaximalrunofincreasingordecreasingelements.

2ASimplePolynomialTimeSortingbyReversalsTechnique

Inthissectionwedescribeourapproachtowardssortingbyreversals.Thealgorithmweproposeconsistsof2stages:1)removingorientedpairsfromthepermutationusingthebit-vectormethod,2)completingthesorting(ifrequired)usingthegreedybreakpointalgorithm.

ThemotivationbehindthisworkistoavoidexpensiveoperationsintroducedbyHannenhalliet.al[HP95],andKaplanet.al[HRR97].Thoseoperationarefindingandclearinghurdles(i.e.unorientedcomponentsintheoverlapgraphwhoseelementsareconsecutive)andfortresses(permutationswithodd

timebesidethetimeneededfornumberofsuperhurdles).Performingthosegraphoperationsrequire

removingtheorientedpairsafterwards.

Inthefirststageofthealgorithmwestartwithimplicitlyconstructinganoverlapgraphusingabitmatrix.Anoverlapgraph(OV-Graph)isanedge-coloredgraphconstructedontheunsignedversionofthesignedpermutation.isobtainedbyreplacingeachpositiveelementbythesequence

,andeachnegativeelementsby.AnOV-Graphofapermutationoflengthhas

edgesandvertices.IngeneralanOV-Graphhastwotypesofedges,blackedgesthatjoinevery

where.Acharacteristicofotherelementinwhilegrayedgesthatjoinadjacentelements

OV-Graphisthateachvertexhasadegreeofexactly2givingauniquecycledecompositionofthegraph.A

withoppositesignsiscalledanorientedpair.pairofconsecutiveintegers,thatis

IntheremainingdiscussionwefocusonarcoverlapgraphwhereeveryedgeintheOV-GraphisavertexinthearcOV-Graph,edgesinthisgraphjoinverticeswhosegrayedgeswereintersectingintheOV-Graph.

Example2.1Considerthepermutationsignedpermutationisgivenby:forisshowninFigure1.

.Thecorrespondingun-.TheOV-Graph

Ingeneral,avertexinthearcOV-Graphisanorientedvertexifitenclosesanoddnumberofelementsin.Theorientedsortalgorithmmakesuseof2basicfacts:1)avertexhasanodddegreeiffitisoriented,

2

Figure1:OverlapGraph

2)ifoneperformsareversalcorrespondingtoanorientedvertex,eachvertexadjacenttowillchangeitsorientation[Ber01].Thesetwofactstogetherwiththetheoremin[BS01]thatstatedthatareversalthatmaximizesthenumberoforientedverticesissafeformthetheoreticalbackbonefortheorientedsortingtechnique.

TheorientedsortalgorithmstartswithconstructinganadjacencymatrixforthearcOV-Graph.Twoextra

areappendedattheendcorrespondingtotheparityandscoreofthevertex.Theparityspecifiesrows

whetheravertexhasevenorodddegree(i.e.unorientedororientedresp.).Thescoreisdefinedasthedifferencebetweenthenumberoforientedandunorientedneighborsofthevertex.

.ThecorrespondingarcOV-GraphisshowninExample2.2Consider

Figure2,whereorientedverticesaremarkedinblack.ThebitmatrixforisshowninFigure3.

Figure2:ArcOverlapGraph

Theorientedsortingalgorithmisasfollows:

WHILEthereexistsanorientedvertexinthebitmatrixBEGIN

Letbeavertexwithmaximumscore

Figure3:BitMatrixofOV-graphinFigure2

FOReachvertexadjacenttoIFisorientedBEGIN

END

ELSE

BEGIN

3

END

END

Applyingtheabovealgorithmonasignedpermutationremovesallorientedpairsandresultsinan

unsignedpermutation.Ifisstillunsortedwepresentittoagreedybreakpointalgorithmthatguaranteestoobtaintheidentityattheend.In[Sha95]itisprovedthatreversaldistanceobtainedfromthegreedybreakpointalgorithmisatmosttwiceasbadastheoptimalsolution.Thegreedyalgorithmproceedsasfollows:

BEGIN

WHILEcontainsabreakpointDOBEGIN

Letbeareversalthatremovesthemostbreakpointsof,resolvingtiesamongthosethatremoveonebreakpointinfavorofreversalsthatleaveadecreasingstrip.

ENDEND

Thecomplexityofbothalgorithmsis.Therefore,theoverallcomplexityofourintegratedap-makingitcompetitivewithothertechniquesthataremorecomplextoapply.proachis

3ExperimentalResults

Inthissectionwepresenttheresultsobtainedbyapplyingourapproachonbothrealandsyntheticdatasets.

ThesyntheticdatathatweusedintheexperimentswereobtainedusingtheJavaappletprovidedbyMantin[Man].Wegeneratedrandomsingedpermutationsofvariouslengthanddifferentevolutionaryrate(i.e.numberofreversals).Ontheotherhand,therealdatawereobtainedfromthetestsetsprovidedbyGRAPPA[oNMoTaA01]fordifferent”Campanulaceae”species(flowerplant),MGR[GP02]forhuman-and-mousegeneorder,and[BP02]forHerpesvirusdata.TheexperimentswereconductedonLinuxplatformonaGHzAMDAthlon(tm)XP+processorwithMBmemory.

Weconducted3differentexperimentsonsyntheticdata.Thefirstexperimentconsideredthesortingperformanceoftheorientedsortingtechniquedefinedasthenumberofsuccessfullysortedpermutations.Figure4presentstheresultsproducedbyourimplementationonfilesofrandompermutationsofdifferent

)eachfilewithpermutationsgeneratedusing[Man]withvalueslengths(

.Theresultsshowthatontheaverage%ofthesequenceswerefortheevolutionrate””:

completelysorted.Theconclusionfromthisexperimentisthatonly%ofthesequencesneedtobefurthertreatedusinggreedybreakpointmethod.

Thesecondexperimentconsideredtherunningtimeoftheorientedsortingtechnique.Figure5presentstheresultsoftherunningtimeofthealgorithmunderthesamesettingsinExperiment1.WefocusedonlyonthesequencesthatwerecompletelysortedinExperiment1.Ourconclusionsisthatastheevolutionrateincreases,therequiredsortingtimeincreasesaswell.Besides,thetimeremainstobealmostlinearfor

whichisanoverestimateofatypicalgenomicsequencelength(i.e.humansequencesoflengthupto

genesmatchingwithmouse).genomehasonly

Thethirdexperimentconsideredtherelationbetweenevolutionrateandtime.Wegeneratedfilesof

andcomputedthetimeneededforvaryinginincrementsof.Therandompermutationsoflength

4

Performance5040Time300250k=20k=30No. SortedTime (sec)3020100k=20k=30k=40503942421003637332003740374003231388003232341600283034200150100500501002004008001600Sequence Lengthk=20k=30k=40k=40Sequence LengthFigure4:SortingPerformance

Evolution Rate vs Time43.532.521.510.502006001003004005007008009001000Figure5:RunningTimePerformance

time (sec)Length=1000kFigure6:EvolutionRateversusTimeFigure7:HerpesVirusPhylogeneticTree

(i.e.slopetendstomainobservationisthatsaturationstateisreachedasevolutionrateapproaches

)decreaseafter

Next,weconducted3experimentsonrealdata.Inexperiment1weconsidereddifferentspeciesof”Campanulaceae”[oNMoTaA01],wecomputedthereversaldistancebetweenthedifferentspeciesandTobacco(i.e.identitypermutation).WecomparedtheresultswiththeoptimalresultsfromGRIMM[G.02].AlthoughTable1showsgoodcomparativeresults,theorientedsortalgorithmfailedtosortPlatyncodon,LegousiaandCodonopsisspeciescompletely.

Experiment2wasdoneonHerpesVirusdata[BP02].Weobservedthatthereversaldistancesandreversalsequencesin[BP02]matchedexactlywiththeonesperformedbyouralgorithm.TheresultsforthisexperimentareshowninFigure7.

ThelastexperimentwasdoneonHumanandMousegeneorderdata[GP02].Inthisexperimentwein-SpeciesCyanathusTriodanusSymphanraReversalDistancetoTobacco

111312GRIMMOptimalDistance

111312

Table1:ReversalDistancetoTobacco

5

12 13 14 15 -9 -8 -7 -6 47 48 -46 -45 -44 -11 -10 -58 -57 -56 92 93 -95 -94 -21 -20 -5 -4 -3 -2 -1 34 35 41 42 43 36 37 38 -64 -63 61 62 65 66 67 68 90 91 -55 -54 51 52 53 39 40 -60 -59 -77 -76 -19 -18 16 17 -97 -96 -75 -74 -73 24 25 78 79 -83 -82 -81 -80 84 85 86 87 -28 -27 -26 22 23 98 99 69 70 -72 -71 -33 -32 -31 -30 -29 88 89 -50 -49 -105 -104 106 107 108 114 115 -117 -116 -103 -102 109 110 111 112 113 -101 -100 118 119 120 121 122 123 (mouse genome and human is identity)40 reversals0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 9339 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 6768 69 7094 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 1243 reversalsIdentityFigure8:Human-MouseGeneOrder

tegratedtheorientedsortalgorithmtogetherwiththegreedybreakpointmethodtoconvertmousetohuman.First,weobtainedasequencefreeoforientedpairsin40reversals.Thissequencerequired3morereversalsbythegreedyalgorithmtoobtainidentity.The43reversalsarecomparabletotheoptimalfigureof41fromGRIMM.TheblockssortedbythegreedyalgorithmareindicatedinitalicsandboldfontinFigure8.

4ConclusionsandFutureWork

Inthisworkwepresentedatechniquethatintegratesthebit-matrixorientedsortingandgreedybreakpointreversaltechniques.Theapproachproposedwastestedonbothrealandsyntheticdataandwasableachievenearoptimalresultsforsortingsignedpermutations.Abasicfeatureoftheproposedmethodisbeingsimpleandrelativelyfast.

Webelievethatthetechniqueweimplementedcanprovidegoodresultsandthatfurtherexperimentscanstrengthenourclaim.Aspartoftheproposedfuturework,wethinkthatimplementingothertechniquescanhelptopresentabettercomparativeanalysis.Besides,wefinditusefultotrytestingthetechniqueondifferentdatasetsincludingexonorderratherthangeneorder,aswellasconsideringdifferentspeciesandtryingtocomputereversaldistancebetweenthemtoconstructphylogenetictrees.

References

[Ber01][BH96]

AnneBergeron.AveryelementarypresentationoftheHannenhalli-Pevznertheory.LectureNotesinComputerScience,2001.

P.BermanandS.Hannenhalli.Fastsortingbyreversal.InD.S.HirschbergandE.W.My-ers,editors,Proceedingsofthe7thAnnualSymposiumonCombinatorialPatternMatching,number1075,pages168–185,LagunaBeach,CA,1996.Springer-Verlag,Berlin.V.BafnaandP.A.Pevzner.Genomerearragementsandsortingbyreversals.SIAMJ.Comput.,25:272–289,1996.

GuillaumeBourqueandPavelA.Pevzner.Genome-scaleevolution:Reconstructinggeneordersintheancestralspecies.GenomeResearch,12(1):26–36,January2002.

[BP96][BP02]

6

[BS01][Cap97][G.02][GP02][HP95][HRR97]

AnneBergeronandF.Strasbourg.Experimentsincomputingsequencesofinversions.InWABI01,pages166–176,Berlin,2001.Springer-Verlag.

A.Caprara.Sortingbyreversalsisdifficult.InNM.ACMPress,editor,1stConferenceonComputationalMolecularBiology(RECOMB97),pages75–83,SantaFe,1997.

TeslerG.Grimm:genomerearrangementswebserver.Bioinformatics,18(3):492–493,March2002.

BourqueG.andPevznerP.Genome-scaleevolution:Reconstructinggeneordersintheancestralspecies.GenomeRes.,12(1):26–36,2002.

SridharHannenhalliandPavelPevzner.Transformingcabbageintoturnip:polynomialalgorithmforsortingsignedpermutationsbyreversals.pages178–189,1995.

KaplanH.,ShamirR.,andTarjanR.Fasterandsimpleralgorithmforsortingsignedper-mutationsbyreversals.InSODA:ACM-SIAMSymposiumonDiscreteAlgorithms(ACon-ferenceonTheoreticalandExperimentalAnalysisofDiscreteAlgorithms),1997.

J.KececiogluandD.Sankoff.Exactandapproximationalgorithmsforsortingbyreversals,withapplicationtogenomerearrangement.Algorithmica,13:180–210,1995.

ItsikMantin.Javaappletfor:Analgorithmforsortingsignedpermutationsbyreversals.http://www.math.tau.ac.il/rshamir/GR/.

[KS95][Man]

[oNMoTaA01]TheUniversityofNewMexicoandTheUniversityofTexasatAustin.Grappa:

Genomerearrangementsanalysisunderparsimonyandotherphylogeneticalgorithms.http://www.cs.unm.edu/moret/GRAPPA/,2001.

[SCA90][Sha95]

D.Sankoff,R.Cedergren,andY.Abel.Genomicdivergencethroughgenerearrangement.MethodsinEnzymology,183:428–438,1990.

RonShamir.Lecturenotes:Genomehttp://www.math.tau.ac.il/rshamir/algmb/95/94-95.html,1995.

rearrangement.

7

因篇幅问题不能全部显示,请点此查看更多更全内容