您的当前位置:首页正文

speex的手册文档

2021-11-04 来源:好走旅游网
TheSpeexCodecManual

(version1.0.4)

Jean-MarcValin14thJuly2004

1

2

Copyright(c)2002-2004Jean-MarcValin.

Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.1oranylaterversionpub-lishedbytheFreeSoftwareFoundation;withnoInvariantSection,withnoFront-CoverTexts,andwithnoBack-Cover.Acopyofthelicenseisincludedinthesectionentitled\"GNUFreeDocumentationLicense\".

CONTENTSContents

1IntroductiontoSpeex2Featuredescription

3

Command-lineencoder/decoder

3.1speexenc.................................3.2speexdec.................................4

ProgrammingwithSpeex(thelibspeexAPI)4.1Encoding................................4.2Decoding................................4.3CodecOptions(speex_*_ctl)......................4.4Modequeries..............................4.5Packingandin-bandsignalling......

...............

5

Formatsandstandards

5.1RTPPayloadFormat..........................5.2MIMEType...............................5.3Oggfileformat.............................6

IntroductiontoCELPCoding

6.1LinearPrediction(LPC)........................6.2PitchPrediction.............................6.3InnovationCodebook..........................6.4Analysis-by-SynthesisandErrorWeighting..............7

Speexnarrowbandmode

7.1LPCAnalysis..............................7.2PitchPrediction(adaptivecodebook)..................7.3InnovationCodebook..........................7.4Bitallocation..............................7.5Perceptualenhancement......

..................8

Speexwidebandmode(sub-bandCELP)8.1LinearPrediction............................8.2PitchPrediction.............................8.3ExcitationQuantization.........................8.4Bitallocation.............

.................

AFAQ

BSamplecode

B.1sampleenc.c...............................B.2sampledec.c...............................

3

679910111112131415161616161717191919212121222223242424242426303031

CONTENTSCIETFRTPProfileDSpeexLicense

EGNUFreeDocumentationLicense

4345051

LISTOFTABLES5

ListofTables

12345

In-bandsignallingcodes............Ogg/Speexheaderpacket............Bitallocationfornarrowbandmodes......Qualityversusbit-rate..............Bitallocationforhigh-bandinwidebandmode

............................................................

1517222325

1INTRODUCTIONTOSPEEX6

1IntroductiontoSpeex

TheSpeexproject(http://www.speex.org/)hasbeenstartedbecausetherewasaneedforaspeechcodecthatwasopen-sourceandfreefromsoftwarepatents.Theseareessentialconditionsforbeingusedbyanyopen-sourcesoftware.ThereisalreadyVorbisthatdoesgeneralaudio,butitisnotreallysuitableforspeech.Also,unlikemanyotherspeechcodecs,Speexisnottargetedatcellphones(notmanyopen-sourcecellphonesanyway:-))butratheratvoiceoverIP(VoIP)andfile-basedcompression.Asdesigngoals,wewantedtohaveacodecthatwouldallowbothverygoodqualityspeechandlowbit-rate(unfortunatelynotatthesametime!),whichledustodevelop-ingacodecwithmultiplebit-rates.Ofcourseverygoodqualityalsomeantwehadtodowideband(16kHzsamplingrate)inadditiontonarrowband(telephonequality,8kHzsamplingrate).

DesigningforVoIPinsteadofcellphoneusemeansthatSpeexmustberobusttolostpackets,butnottocorruptedonessincepacketseitherarriveunalteredordon’tar-riveatall.Also,theideawastohaveareasonablecomplexityandmemoryrequirementwithoutcompromisingtoomuchontheefficiencyofthecodec.

AllthisledustothechoiceofCELPastheencodingtechniquetouseforSpeex.OneofthemainreasonsisthatCELPhaslongprovedthatitcoulddothejobandscalewelltobothlowbit-rates(thinkDoDCELP@4.8kbps)andhighbit-rates(thinkG.728@16kbps).

Themaincharacteristicscanbesummarizedasfollows:󰀏Freesoftware/open-source,patentandroyalty-free

󰀏Integrationofnarrowbandandwidebandinthesamebit-stream󰀏Widerangeofbit-ratesavailable(from2kbpsto44kbps)󰀏Dynamicbit-rateswitchingandVariableBit-Rate(VBR)󰀏VoiceActivityDetection(VAD,integratedwithVBR)󰀏Variablecomplexity

󰀏Ultra-widebandmodeat32kHz(upto48kHz)󰀏Intensitystereoencodingoption

Thisdocumentisdividedinthefollowingway.Section2describesthedifferentSpeexfeaturesanddefinessometermsthatwillbeusedinlatersections.Section3providesinformationaboutthestandardcommand-linetools,while4containsinformationaboutprogrammingusingtheSpeexAPI.Section5hassomeinformationrelatedtoSpeexandstandards.Thethreelastsectionsdescribetheinternalsofthecodecandrequiresomesignalprocessingknowledge.Section6explainsthegeneralideabehindCELP,whilesections7and8arespecifictoSpeex.NotethatifyouareonlyinterestedinusingSpeex,thosethreelastsectionsarenotrequired.

2FEATUREDESCRIPTION7

2Featuredescription

ThissectionexplainsthemainSpeexfeatures,aswellassomeconceptsinspeechcodingthathelpbetterunderstandthenextsections.

Samplingrate

Speexismainlydesignedfor3differentsamplingrates:8kHz,16kHz,and32kHz.Thesearerespectivelyreferedtoasnarrowband,widebandandultra-wideband.

Quality

Speexencodingiscontrolledmostofthetimebyaqualityparameterthatrangesfrom0to10.Inconstantbit-rate(CBR)operation,thequalityparameterisaninteger,whileforvariablebit-rate(VBR),theparameterisafloat.

Complexity(variable)

WithSpeex,itispossibletovarythecomplexityallowedfortheencoder.Thisisdonebycontrollinghowthesearchisperformedwithanintegerrangingfrom1to10inawaythat’ssimilartothe-1to-9optionstogzipandbzip2compressionutilities.Fornormaluse,thenoiselevelatcomplexity1isbetween1and2dBhigherthanatcomplexity10,buttheCPUrequirementsforcomplexity10isabout5timeshigherthanforcomplexity1.Inpractice,thebesttrade-offisbetweencomplexity2and4,thoughhighersettingsareoftenusefulwhenencodingnon-speechsoundslikeDTMFtones.

VariableBit-Rate(VBR)

Variablebit-rate(VBR)allowsacodectochangeitsbit-ratedynamicallytoadapttothe“difficulty”oftheaudiobeingencoded.IntheexampleofSpeex,soundslikevowelsandhigh-energytransientsrequireahigherbit-ratetoachievegoodquality,whilefricatives(e.g.s,fsounds)canbecodedadequatelywithlessbits.Forthisreason,VBRcanachivelowerbit-rateforthesamequality,orabetterqualityforacertainbit-rate.Despiteitsadvantages,VBRhastwomaindrawbacks:first,byonlyspecifyingquality,there’snoguarantyaboutthefinalaveragebit-rate.Second,forsomereal-timeapplicationslikevoiceoverIP(VoIP),whatcountsisthemaximumbit-rate,whichmustbelowenoughforthecommunicationchannel.

AverageBit-Rate(ABR)

Averagebit-ratesolvesoneoftheproblemsofVBR,asitdynamicallyadjustsVBRqualityinordertomeetaspecifictargetbit-rate.Becausethequality/bit-rateisadjustedinreal-time(open-loop),theglobalqualitywillbeslightlylowerthanthatobtainedbyencodinginVBRwithexactlytherightqualitysettingtomeetthetargetaveragebit-rate.

2FEATUREDESCRIPTION8

VoiceActivityDetection(VAD)

Whenenabled,voiceactivitydetectiondetectswhethertheaudiobeingencodedisspeechorsilence/backgroundnoise.VADisalwaysimplicitlyactivatedwhenencodinginVBR,sotheoptionisonlyusefulinnon-VBRoperation.Inthiscase,Speexdetectsnon-speechperiodsandencodethemwithjustenoughbitstoreproducethebackgroundnoise.Thisiscalled“comfortnoisegeneration”(CNG).

DiscontinuousTransmission(DTX)

DiscontinuoustransmissionisanadditiontoVAD/VBRoperation,thatallowstostoptransmittingcompletelywhenthebackgroundnoiseisstationary.Infile-basedopera-tion,sincewecannotjuststopwritingtothefile,only5bitsareusedforsuchframes(correspondingto250bps).

Perceptualenhancement

Perceptualenhancementisapartofthedecoderwhich,whenturnedon,triestoreduce(theperceptionof)thenoiseproducedbythecoding/decodingprocess.Inmostcases,perceptualenhancementmakethesoundfurtherfromtheoriginalobjectively(ifyouuseSNR),butintheenditstillsoundsbetter(subjectiveimprovement).

Algorithmicdelay

Everyspeechcodecintroducesadelayinthetransmission.ForSpeex,thisdelayisequaltotheframesize,plussomeamountof“look-ahead”requiredtoprocesseachframe.Innarrowbandoperation(8kHz),thedelayis30ms,whileforwideband(16kHz),thedelayis34ms.Thesevaluesdon’taccountfortheCPUtimeittakestoencodeordecodetheframes.

3COMMAND-LINEENCODER/DECODER9

3Command-lineencoder/decoder

ThebaseSpeexdistributionincludesacommand-lineencoder(speexenc)anddecoder(speexdec).Thissectiondescribeshowtousethesetools.

3.1speexenc

ThespeexencutilityisusedtocreateSpeexfilesfromrawPCMorwavefiles.Itcanbeusedbycalling:

speexenc[options]input_fileoutput_file

Thevalue’-’forinput_fileoroutput_filecorrespondsrespectivelytostdinandstdout.Thevalidoptionsare:

–narrowband(-n)TellSpeextotreattheinputasnarrowband(8kHz).Thisisthe

default–wideband(-w)TellSpeextotreattheinputaswideband(16kHz)

–ultra-wideband(-u)TellSpeextotreattheinputas“ultra-wideband”(32kHz)–qualitynSettheencodingquality(0-10),defaultis8–bitratenEncodingbit-rate(usebit-ratenorlower)–vbrEnableVBR(VariableBit-Rate),disabledbydefault

–abrnEnableABR(AverageBit-Rate)atnkbps,disabledbydefault–vadEnableVAD(VoiceActivityDetection),disabledbydefault–dtxEnableDTX(DiscontinuousTransmission),disabledbydefault

–nframesnPacknframesineachOggpacket(thissavesspaceatlowbit-rates)–compnSetencodingspeed/qualitytradeoff.Thehigherthevalueofn,theslower

theencoding(defaultis3)-VVerboseoperation,printbit-ratecurrentlyinuse–help(-h)Printthehelp

–version(-v)PrintversioninformationSpeexcomments

–commentAddthegivenstringasanextracomment.Thismaybeusedmultiple

times.–authorAuthorofthistrack.–titleTitleforthistrack.

3COMMAND-LINEENCODER/DECODER10

Rawinputoptions

–ratenSamplingrateforrawinput–stereoConsiderrawinputasstereo–leRawinputislittle-endian–beRawinputisbig-endian–8bitRawinputis8-bitunsigned–16bitRawinputis16-bitsigned

3.2speexdec

speexdec[options]speex_file[output_file]

ThespeexdecutilityisusedtodecodeSpeexfilesandcanbeusedbycalling:

Thevalue’-’forinput_fileoroutput_filecorrespondsrespectivelytostdinandstdout.Also,whennooutput_fileisspecified,thefileisplayedtothesoundcard.Thevalidoptionsare:

–enhenablepost-filter(default)–no-enhdisablepost-filter

–force-nbForcedecodinginnarrowband–force-wbForcedecodinginwideband–force-uwbForcedecodinginultra-wideband–monoForcedecodinginmono–stereoForcedecodinginstereo

–ratenForcedecodingatnHzsamplingrate–packet-lossnSimulaten%randompacketloss-VVerboseoperation,printbit-ratecurrentlyinuse–help(-h)Printthehelp

–version(-v)Printversioninformation

4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)11

4ProgrammingwithSpeex(thelibspeexAPI)

ThissectionexplainshowtousetheSpeexAPI.ExamplesofcodecanalsobefoundinappendixB.

4.1Encoding

#include

InordertoencodespeechusingSpeex,youfirstneedto:

YouthenneedtodeclareaSpeexbit-packingstruct

SpeexBitsbits;andaSpeexencoderstate

void*enc_state;Thetwoareinitializedby:

speex_bits_init(&bits);

enc_state=speex_encoder_init(&speex_nb_mode);

Forwidebandcoding,speex_nb_modewillbereplacedbyspeex_wb_mode.Inmostcases,youwillneedtoknowtheframesizeusedbythemodeyouareusing.Youcangetthatvalueintheframe_sizevariablewith:

speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);Oncetheinitializationisdone,foreveryinputframe:

speex_bits_reset(&bits);

speex_encode(enc_state,input_frame,&bits);

nbBytes=speex_bits_write(&bits,byte_ptr,MAX_NB_BYTES);whereinput_frameisa(float*)pointingtothebeginningofaspeechframe,byte_ptrisa(char*)wheretheencodedframewillbewritten,MAX_NB_BYTESisthemaxi-mumnumberofbytesthatcanbewrittentobyte_ptrwithoutcausinganoverflowandnbBytesisthenumberofbytesactuallywrittentobyte_ptr(theencodedsizeinbytes).Beforecallingspeex_bits_write,itispossibletofindthenumberofbytesthatneedtobewrittenbycallingspeex_bits_nbytes(&bits),whichreturnsanumberofbytes.Afteryou’redonewiththeencoding,freeallresourceswith:

speex_bits_destroy(&bits);

speex_encoder_destroy(enc_state);That’saboutitfortheencoder.

4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)12

4.2Decoding

#include

InordertoencodespeechusingSpeex,youfirstneedto:

YoualsoneedtodeclareaSpeexbit-packingstruct

SpeexBitsbits;andaSpeexencoderstate

void*dec_state;Thetwoareinitializedby:

speex_bits_init(&bits);

dec_state=speex_decoder_init(&speex_nb_mode);

Forwidebanddecoding,speex_nb_modewillbereplacedbyspeex_wb_mode.Ifyouneedtoobtainthesizeoftheframesthatwillbeusedbythedecoder,youcangetthatvalueintheframe_sizevariablewith:

speex_decoder_ctl(dec_state,SPEEX_GET_FRAME_SIZE,&frame_size);Thereisalsoaparameterthatcanbesetforthedecoder:whetherornottouseaperceptualpost-filter.Thiscanbesetby:

speex_decoder_ctl(dec_state,SPEEX_SET_ENH,&enh);

whereenhisanintthatwithvalue0tohavethepost-filterdisabledand1tohaveitenabled.

Again,oncethedecoderinitializationisdone,foreveryinputframe:

speex_bits_read_from(&bits,input_bytes,nbBytes);speex_decode(st,&bits,output_frame);

whereinput_bytesisa(char*)containingthebit-streamdatareceivedforaframe,nbBytesisthesize(inbytes)ofthatbit-stream,andoutput_frameisa(float*)andpointstotheareawherethedecodedspeechframewillbewritten.ANULLvalueasthefirstargumentindicatesthatwedon’thavethebitsforthecurrentframe.Whenaframeislost,theSpeexdecoderwilldoitsbestto\"guess\"thecorrectsignal.Afteryou’redonewiththedecoding,freeallresourceswith:

speex_bits_destroy(&bits);

speex_decoder_destroy(dec_state);

4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)13

4.3CodecOptions(speex_*_ctl)

TheSpeexencoderanddecodersupportmanyoptionsandrequeststhatcanbeaccessedthroughthespeex_encoder_ctlandspeex_decoder_ctlfunctions.Thesefunctionsaresimilartotheioctlsystemcallandtheirprototypesare:

voidspeex_encoder_ctl(void*encoder,intrequest,void*ptr);voidspeex_decoder_ctl(void*encoder,intrequest,void*ptr);Thedifferentvaluesofrequestallowedare(notethatsomeonlyapplytotheencoderorthedecoder):

SPEEX_SET_ENH**Setperceptualenhancertoon(1)oroff(0)(integer)SPEEX_GET_ENH**Getperceptualenhancerstatus(integer)

SPEEX_GET_FRAME_SIZEGettheframesizeusedforthecurrentmode(integer)SPEEX_SET_QUALITY*Settheencoderspeechquality(integer0to10)SPEEX_GET_QUALITY*Getthecurrentencoderspeechquality(integer0to10)SPEEX_SET_MODE*†SPEEX_GET_MODE*†SPEEX_SET_LOW_MODE*†SPEEX_GET_LOW_MODE*†SPEEX_SET_HIGH_MODE*†SPEEX_GET_HIGH_MODE*†

SPEEX_SET_VBR*Setvariablebit-rate(VBR)toon(1)oroff(0)(integer)SPEEX_GET_VBR*Getvariablebit-rate(VBR)status(integer)

SPEEX_SET_VBR_QUALITY*SettheencoderVBRspeechquality(float0to10)SPEEX_GET_VBR_QUALITY*GetthecurrentencoderVBRspeechquality(float

0to10)SPEEX_SET_COMPLEXITY*SettheCPUresourcesallowedfortheencoder(in-teger1to10)SPEEX_GET_COMPLEXITY*GettheCPUresourcesallowedfortheencoder(in-teger1to10)SPEEX_SET_BITRATE*Setthebit-ratetousetotheclosestvaluenotexceeding

theparameter(integerinbps)

4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)14

SPEEX_GET_BITRATEGetthecurrentbit-rateinuse(integerinbps)SPEEX_SET_SAMPLING_RATESetrealsamplingrate(integerinHz)SPEEX_GET_SAMPLING_RATEGetrealsamplingrate(integerinHz)

SPEEX_RESET_STATEResettheencoder/decoderstatetoitsoriginalstate(zeros

allmemories)SPEEX_SET_VAD*Setvoiceactivitydetection(VAD)toon(1)oroff(0)(integer)SPEEX_GET_VAD*Getvoiceactivitydetection(VAD)status(integer)

SPEEX_SET_DTX*Setdiscontinuoustransmission(DTX)toon(1)oroff(0)(inte-ger)SPEEX_GET_DTX*Getdiscontinuoustransmission(DTX)status(integer)SPEEX_SET_ABR*Setaveragebit-rate(ABR)toavalueninbitspersecond(inte-gerinbps)SPEEX_GET_ABR*Getaveragebit-rate(ABR)setting(integerinbps)*appliesonlytotheencoder**appliesonlytothedecoder†normallyonlyusedinternally

4.4Modequeries

Speexmodeshaveaquerysystemsimilartothespeex_encoder_ctlandspeex_decoder_ctlcalls.Sincemodesareread-only,itisonlypossibletogetinformationaboutaparticularmode.Thefunctionusedtodothatis:

voidspeex_mode_query(SpeexMode*mode,intrequest,void*ptr);Theadmissiblevaluesforrequestare(unlessotherwisenote,thevaluesarereturnedthroughptr):

SPEEX_MODE_FRAME_SIZEGettheframesize(insamples)forthemodeSPEEX_SUBMODE_BITRATEGetthebit-rateforasubmodenumberspecified

throughptr(integerinbps).

4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)15

4.5Packingandin-bandsignalling

Sometimesitisdesirabletopackmorethanoneframeperpacket(orotherbasicunitofstorage).Theproperwaytodoitistocallspeex_encodeNtimesbeforewritingthestreamwithspeex_bits_write.Incaseswherethenumberofframesisnotdeterminedbyanout-of-bandmechanism,itispossibletoincludeaterminatorcode.Thattermi-natorconsistsofthecode15(decimal)encodedwith5bits,asshowninfigure4.Notethatasofversion1.0.2,callingspeex_bits_writeautomaticallyinsertstheterminatorsoastofillthelastbyte.Thisdoesn’tinvolvesanyoverheadandmakessureSpeexcanalwaysdetectwhenthereisnomoreframeinapacket.

Itisalsopossibletosendin-band“messages”totheotherside.Allthesemessagesareencodedas“pseudo-frames”ofmode14whichcontaina4-bitmessagetypecode,followedbythemessage.Table1liststheavailablecodes,theirmeaningandthesizeofthemessagethatfollows.Mostofthesemessagesarerequeststhataresenttotheencoderordecoderontheotherend,whichisfreetocomplyorignorethem.Bydefault,allin-bandmessagesareignored.Code0123456789101112131415Size(bits)1144444488161632326464ContentAsksdecodertosetperceptualenhancementoff(0)oron(1)Asks(if1)theencodertobeless“agressive”duetohighpacketlossAsksencodertoswitchtomodeNAsksencodertoswitchtomodeNforlow-bandAsksencodertoswitchtomodeNforhigh-bandAsksencodertoswitchtoqualityNforVBRRequestacknowloedge(0=no,1=all,2=onlyforin-banddata)AsksencodertosetCBR(0),VAD(1),DTX(3),VBR(5),VBR+DTX(7)Transmit(8-bit)charactertotheotherendIntensitystereoinformationAnnouncemaximumbit-rateacceptable(Ninbytes/second)reservedAcknowledgereceivingpacketNreservedreservedreservedTable1:In-bandsignallingcodes

Finally,applicationsmaydefinecustomin-bandmessagesusingmode13.Thesizeofthemessageinbytesisencodedwith5bits,sothatthedecodercanskipitifitdoesn’tknowhowtointerpretit.

5FORMATSANDSTANDARDS16

5Formatsandstandards

Speexcanencodespeechinbothnarrowbandandwidebandandprovidesdifferentbit-rates.However,notallfeaturesneedtobesupportedbyacertainimplementationordevice.Inordertobecalled“Speexcompatible”(whateverthatmeans),animplemen-tationmustimplementatleastabasicsetoffeatures.

Attheminimum,allnarrowbandmodesofoperationMUSTbesupportedatthedecoder.Thisincludesthedecodingofawidebandbit-streambythenarrowbandde-coder1.Ifpresent,awidebanddecoderMUSTbeabletodecodeanarrowbandstream,andMAYeitherbeabletodecodeallwidebandmodesorbeabletodecodetheem-beddednarrowbandpartofallmodes(whichincludesignoringthehigh-bandbits).Forencoders,atleastonenarrowbandorwidebandmodeMUSTbesupported.Themainreasonwhyallencodingmodesdonothavetobesupportedisthatsomeplatformsmaynotbeabletohandlethecomplexityofencodinginsomemodes.

5.1RTPPayloadFormat

TheRTPpayloaddraftisincludedinappendixCandthelatestversionisavailableathttp://www.speex.org/drafts/latest.Thisdrafthasbeensent(2003/02/26)totheInternetEngineeringTaskForce(IETF)andwillbediscussedattheMarch18thmeetinginSanFrancisco.

5.2MIMEType

Fornow,youshouldusetheMIMEtypeaudio/x-speexforSpeex.Wewillapplyfortypeaudio/speexinthenearfuture.

5.3Oggfileformat

Speexbit-streamscanbestoredinOggfiles.Inthiscase,thefirstpacketoftheOggfilecontainstheSpeexheaderdescribedintable2.Allintegerfieldsintheheadersarestoredaslittle-endian.Thespeex_stringfieldmustcontainthe“Speex“(with3trainingspaces),whichidentifiesthebit-stream.Thenextfield,speex_versioncontainstheversionofSpeexthatencodedthefile.Fornow,refertospeex_header.[ch]formoreinfo.Thebeginningofstream(b_o_s)flagissetto1fortheheader.Theheaderpackethaspacketno=0andgranulepos=0.

ThesecondpacketcontainstheSpeexcommentheader.TheformatusedistheVor-biscommentformatdescribedhere:http://www.xiph.org/ogg/vorbis/doc/v-comment.html.Thispackethaspacketno=1andgranulepos=0.

Thethirdandsubsequentpacketseachcontainoneormore(numberfoundinheader)Speexframes.Theseareidentifiedwithpacketnostartingfrom2andthegranuleposisthenumberofthelastsampleencodedinthatpacket.Thelastofthesepacketshastheendofstream(e_o_s)flagissetto1.

1The

widebandbit-streamcontainsanembeddednarrowbandbit-streamwhichcanbedecodedalone

6INTRODUCTIONTOCELPCODINGFieldspeex_stringspeex_versionspeex_version_idheader_sizeratemodemode_bitstream_versionnb_channelsbitrateframe_sizevbrframes_per_packetextra_headersreserved1reserved2Typechar[]char[]intintintintintintintintintintintintintSize820444444444444417

Table2:Ogg/Speexheaderpacket

6IntroductiontoCELPCoding

SpeexisbasedonCELP,whichstandsforCodeExcitedLinearPrediction.ThissectionattemptstointroducetheprinciplesbehindCELP,soifyouarealreadyfamiliarwithCELP,youcansafelyskiptosection7.TheCELPtechniqueisbasedonthreeideas:1.Theuseofalinearprediction(LP)modeltomodelthevocaltract

2.Theuseof(adaptiveandfixed)codebookentriesasinput(excitation)oftheLPmodel3.Thesearchperformedinclosed-loopina“perceptuallyweighteddomain”ThissectiondescribesthebasicideasbehindCELP.Notethatit’sstillincomplete.

6.1LinearPrediction(LPC)

Linearpredictionisatthebaseofmanyspeechcodingtechniques,includingCELP.Theideabehinditistopredictthesignalx[n]usingalinearcombinationofitspastsamples:

y[n]=∑aix[n i]

i=1N

wherey[n]isthelinearpredictionofx[n].Thepredictionerroristhusgivenby:

e[n]=x[n] y[n]=x[n] ∑aix[n i]

i=1N

6INTRODUCTIONTOCELPCODING18

ThegoaloftheLPCanalysisistofindthebestpredictioncoefficientsaiwhichminimizethequadraticerrorfunction:

L 1

E=

n=0

∑[e[n]]2=∑

4

L 1n=0

4

x[n] ∑aix[n i]

i=1

∂E∂aiN

N

52

Thatcanbedonebymakingallderivatives

∂∂E=∂ai∂ai

L 1n=0

equaltozero:

52

=0

∑x[n] ∑aix[n i]

i=1

TheaifiltercoefficientsarecomputedusingtheLevinson-Durbinalgorithm,which

startsfromtheauto-correlationR(m)ofthesignalx[n].

N 1

R(m)=

ForanorderNfilter,wehave:

P

R(0)TR(1)TR=T..R.

i=0

∑x[i]x[i m]

R(1)

R(0)...

¡¡¡

¡¡¡...¡¡¡QUUUS

R(N 1)R(N 2)

...R(0)

QU

UUS

R(N 1)R(N 2)

PTTr=T

R

R(1)R(2)...R(N)

ThefiltercoefficientsaiarefoundbysolvingthesystemRa=r.Whatthe 2¡Levinson-Durbin 3¡algorithmdoeshereismakingthesolutiontotheproblemONinsteadof

ONbyexploitingthefactthatmatrixRistoeplitzhermitian.Also,itcanbeproventhatalltherootsofA(z)arewithintheunitcircle,whichmeansthat1=A(z)isalwaysstable.Thisisintheory;inpracticebecauseoffiniteprecision,therearetwocom-monlyusedtechniquestomakesurewehaveastablefilter.First,wemultiplyR(0)byanumberslightlyaboveone(suchas1.0001),whichisequivalenttoaddingnoisetothesignal.Also,wecanapplyawindowtotheauto-correlation,whichisequivalenttofilteringinthefrequencydomain,reducingsharpresonances.

Thelinearpredictionmodelrepresentseachspeechsampleasalinearcombinationofpastsamples,plusanerrorsignalcalledtheexcitation(orresidual).

x[n]=∑aix[n i]+e[n]

i=1N

Inthez-domain,thiscanbeexpressedas

6INTRODUCTIONTOCELPCODING19

x(z)=

whereA(z)isdefinedas

1

e(z)A(z)

A(z)=1 ∑aiz i

i=1

N

WeusuallyrefertoA(z)astheanalysisfilterand1=A(z)asthesynthesisfilter.Thewholeprocessiscalledshort-termpredictionasitpredictsthesignalx[n]usingapredictionusingonlytheNpastsamples,whereNisusuallyaround10.

BecauseLPCcoefficientshaveverylittlerobustnesstoquantization,theyarecon-vertedtoLineSpectralPair(LSP)coefficientswhichhaveamuchbetterbehaviourwithquantization,oneofthembeingthatit’seasytokeepthefilterstable.

6.2PitchPrediction

Duringvoicedsegments,thespeechsignalisperiodic,soitispossibletotakeadvantageofthatpropertybyapproximatingtheexcitationsignale[n]byagaintimesthepastoftheexcitation:

e[n]9p[n]=βe[n T]

whereTisthepitchperiod,βisthepitchgain.Wecallthatlong-termpredictionsincetheexcitationispredictedfrome[n T]withT)N.

6.3InnovationCodebook

Thefinalexcitatione[n]willbethesumofthepitchpredictionandaninnovationsignalc[n]takenfromafixedcodebook,hencethenameCodeExcitedLinearPrediction.Thefinalexcitationisgivenby:

e[n]=p[n]+c[n]=βe[n T]+c[n]

Thequantizationofc[n]iswheremostofthebitsinaCELPcodecareallocated.Itrepresentstheinformationthatcouldn’tbeobtainedeitherfromlinearpredictionorpitchprediction.Inthez-domainwecanrepresentthefinalsignalX(z)as

X(z)=

C(z)

A(z)(1 βz T)6.4Analysis-by-SynthesisandErrorWeighting

Most(ifnotall)modernaudiocodecsattemptto“shape”thenoisesothatitappearsmostlyinthefrequencyregionswheretheearcannotdetectit.Forexample,theearis

6INTRODUCTIONTOCELPCODING20

moretoleranttonoiseinpartsofthespectrumthatarelouderandviceversa.That’swhyinsteadofminimizingthesimplequadraticerror

E=∑(x[n] x[n])2

n

wherex[n]istheencodersignal,weminimizetheerrorfortheperceptuallyweightedsignal

Xw(z)=W(z)X(z)whereW(z)istheweightingfilter,usuallyoftheform

󰀐󰀑Aγz1

W(z)=󰀐󰀑Aγz2

(1)

withcontrolparametersγ1>γ2.Ifthenoiseiswhiteintheperceptuallyweighteddomain,theninthesignaldomainitsspectralshapewillbeoftheform

󰀐󰀑Aγz2

1

Anoise(z)==󰀐󰀑W(z)Az

γ1

IfafilterA(z)has(complex)polesatpiinthez-plane,thefilterA(z=γ)willhaveitspolesatpHi=γpi,makingitaflatterversionofA(z).

Analysis-by-synthesisreferstothefactthatwhentryingtofindthebestpitchpa-rameters(T,β)andinnovationsignalc[n],wedonotworkbymakingtheexcitatione[n]ascloseastheoriginalone(whichwouldbesimpler),butapplythesynthesis(andweighting)filterandtrymakingXw(z)asclosetotheoriginalaspossible.

7SPEEXNARROWBANDMODE21

7Speexnarrowbandmode

ThissectionlooksathowSpeexworksfornarrowband(8kHzsamplingrate)operation.Theframesizeforthismodeis20ms,correspondingto160samples.Eachframeisalsosubdividedinto4sub-framesof40sampleseach.

Alsomanydesigndecisionswerebasedontheoriginalgoalsandassumptions:󰀏Minimizingtheamountofinformationextractedfrompastframes(forrobust-nesstopacketloss)

󰀏Dynamically-selectablecodebooks(LSP,pitchandinnovation)󰀏sub-vectorfixed(innovation)codebooks

7.1LPCAnalysis

AnLPCanalysisisfirstperformedona(asymetricHamming)windowthatspansallofthecurrentframeandhalfaframeinadvance.TheLPCcoefficientsarethenconvertedtoLineSpectralPair(LSP),arepresentationthatismorerobusttoquantization.TheLSP’sareconsideredtobeassociatedtothe4thsub-framesandtheLSP’sassociatedtothefirst3sub-framesarelinearlyinterpolatedusingthecurrentandpreviousLSP’s.TheLSP’sareencodedusing30bitsforhigherqualitymodesand18bitsforlowerquality,throughtheuseofamulti-stagesplit-vectorquantizer.Forthelowerqualitymodes,the10coefficientsarefirstquantizedwith6bitsandtheerroristhendividedintwo5-coefficientsub-vectors.Eachofthemisquantizedwith6bits,foratotalof18bits.Forthehigherqualitymodes,theremainingerroronbothsub-vectorsisfurtherquantizedwith6bitseach,foratotalof30bits.

TheperceptualweightingfilterW(z)usedbySpeexisderivedfromtheLPCfilterA(z)andcorrespondstotheonedescribedbyeq.1withγ1=0:9andγ2=0:6.WecanusetheunquantizedA(z)filtersincetheweightingfilterisonlyusedintheencoder.

7.2PitchPrediction(adaptivecodebook)

Speexusesa3-tappredictionforpitch.Thatis,thepitchpredictionsignalp[n]isobtainedbythepastoftheexcitationby:

p[n]=β0e[n T 1]+β1e[n T]+β2e[n T+1]

whereTisthepitchperiodandtheβiaretheprediction(filter)taps.Itisworthnotingthatwhenthepitchissmallerthanthesub-framesize,werepeattheexcitationataperiodT.Forexample,whenn T+1,weusen 2T+1instead.Theperiodandquantizedgainsaredeterminedinclosedloop(analysis-by-synthesis).Inmostmodes,thepitchperiodisencodedwith7bitsinthe[17;144]rangeandtheβicoefficientsarevector-quantizedusing7bits(15kbpsnarrowbandandabove)athigherbit-ratesand5bitsatlowerbit-rates(11kbpsnarrowbandandbelow).

7SPEEXNARROWBANDMODE22

7.3InnovationCodebook

InSpeex,theinnovationsignalisquantizedusingsub-vectorshape-onlyvectorquan-tization(VQ).Thatmeansthattheinnovationsignalisdividedinsub-vectors(ofsize5to20)andquantizedusingacodebookthatrepresentsboththeshapeandthegainatthesametime.Thissavesmanybitsthatwouldotherwisebeallocatedforaseparategainatthepriceofaslightincreaseincomplexity.

7.4Bitallocation

Thereare7differentnarrowbandbit-ratesdefinedforSpeex,rangingfrom250bpsto24.6kbps,althoughthemodesbelow5.9kbpsshouldnotbeusedforspeech.Thebit-allocationforeachmodeisdetailedintable3.EachframestartswiththemodeIDencodedwith4bitswhichallowsarangefrom0to15,thoughonlythefirst7valuesareused(theothersarereserved).Theparametersarelistedinthetableintheordertheyarepackedinthebit-stream.Allframe-basedparametersarepackedbeforesub-frameparameters.Theparametersforacertainsub-frameareallpackedbeforethefollowingsub-frameispacked.Notethatthe“OL”intheparameterdescriptionmeansthattheparameterisanopenloopestimationbasedonthewholeframe.ParameterWidebandbitModeIDLSPOLpitchOLpitchgainOLExcgainFinepitchPitchgainInnovationgainInnovationVQTotalUpdaterateframeframeframeframeframeframesub-framesub-framesub-framesub-frameframe01400000000511418745001043214187050501611931418005751201604141800575135220514300057734830061430005773643647143000577396492814187450001079Table3:Bitallocationfornarrowbandmodes

Sofar,noMOS(MeanOpinionScore)subjectiveevaluationhasbeenperformedforSpeex.Inordertogiveanideaofthequalityachivablewithit,table4presentsmyownsubjectiveopiniononit.Itsouldbenotedthatdifferentpeoplewillperceivethequalitydifferentlyandthatthepersonthatdesignedthecodecoftenhasabias(onewayoranother)whenitcomestosubjectiveevaluation.Lastthing,itshouldbenotedthatformostcodecs(includingSpeex)encodingqualitysometimesvariesdependingontheinput.Notethatthecomplexityisonlyapproximate(within0.5mflopsandusingthelowestcomplexitysetting).Decodingrequiresapproximately0.5mflopsinmostmodes(1mflopswithperceptualenhancement).

7SPEEXNARROWBANDMODEMode0123456789101112131415Bit-rate(bps)2502,1505,9508,00011,00015,00018,20024,6003,950N/AN/AN/AN/AN/AN/AN/AmflopsN/A6910141117.514.510.5N/AN/AN/AN/AN/AN/AN/A23

Quality/descriptionNotransmission(DTX)Vocoder(mostlyforcomfortnoise)Verynoticeableartifacts/noise,goodintelligibilityArtifacts/noisesometimesnoticeableArtifactsusuallynoticeableonlywithheadphonesNeedgoodheadphonestotellthedifferenceHardtotellthedifferenceevenwithgoodheadphonesCompletelytransparentforvoice,goodqualitymusicVerynoticeableartifacts/noise,goodintelligibilityreservedreservedreservedreservedApplication-defined,interpretedbycallbackorskippedSpeexin-bandsignalingTerminatorcodeTable4:Qualityversusbit-rate

7.5Perceptualenhancement

Thispartofthecodeconlyappliestothedecoderandcanevenbechangedwithoutaffectinginter-operability.Forthatreason,theimplementationprovidedanddescribedhereshouldonlybeconsideredasareferenceimplementation.Theenhancementsys-temisdividedintotwoparts.First,thesynthesisfilterS(z)=1=A(z)isreplacedbyanenhancedfilter

A(z=a2)A(z=a3)

SH(z)=

A(z)A(z=a1)󰀐󰀑

1 ra11

wherea1anda2dependonthemodeinuseanda3=r1 1 ra2withr=:9.Thesecondpartoftheenhancementconsistsofusingacombfiltertoenhancethepitchintheexcitationdomain.

8SPEEXWIDEBANDMODE(SUB-BANDCELP)24

8Speexwidebandmode(sub-bandCELP)

Forwideband,theSpeexapproachusesaquadraturemirrorfilter(QMF)tosplitthebandintwo.The16kHzsignalisthusdividedintotwo8kHzsignals,onerepre-sentingthelowband(0-4kHz),theotherthehighband(4-8kHz).Thelowbandisencodedwiththenarrowbandmodedescribedinsection7insuchawaythatthere-sulting“embeddednarrowbandbit-stream”canalsobedecodedwiththenarrowbanddecoder.Sincethelowbandencodinghasalreadybeendescribed,onlythehighbandencodingisdescribedinthissection.

8.1LinearPrediction

Thelinearpredictionpartusedforthehigh-bandisverysimilartowhatisdonefornarrowband.Theonlydifferenceisthatweuseonly12bitstoencodethehigh-bandLSP’susingamulti-stagevectorquantizer(MSVQ).Thefirstlevelquantizesthe10coefficientswith6bitsandtheerroristhenquantizedusing6bits,too.

8.2PitchPrediction

Thatpartiseasy:there’snopitchpredictionforthehigh-band.Therearetworeasonsforthat.First,thereisusuallylittleharmonicstructureinthisband(above4kHz).Second,itwouldbeveryhardtoimplementsincetheQMFfoldsthe4-8kHzbandinto4-0kHz(reversingthefrequencyaxis),whichmeansthatthelocationoftheharmonicsisnolongeratmultiplesofthefundamental(pitch).

8.3ExcitationQuantization

Thehigh-bandexcitationiscodedinthesamewayasfornarrowband.

8.4Bitallocation

Forthewidebandmode,theentirenarrowbandframeispackedbeforethehigh-bandisencoded.Thenarrowbandpartofthebit-streamisasdefinedintable3.Thehigh-bandfollows,asdescribedintable5.Thisalsomeansthatawidebandframemaybecorrectlydecodedbyanarrowbanddecoderwiththeonlycaveatthatifmorethanoneframeispackedinthesamepacket,thedecoderwillneedtoskipthehigh-bandpartsinordertosyncwiththebit-stream.

8SPEEXWIDEBANDMODE(SUB-BANDCELP)25

ParameterWidebandbitModeIDLSPExcitationgainExcitationVQTotalUpdaterateframeframeframesub-framesub-frameframe0130004113125036213124201123131244019241312480352Table5:Bitallocationforhigh-bandinwidebandmode

AFAQ26

AFAQ

Vorbisisopen-sourceandpatent-free;whydoweneedSpeex?

VorbisisagreatprojectbutitsgoalsarenotthesameasSpeex.Vorbisismostlyaimedatcompressingmusicandaudioingeneral,whileSpeextargetsspeechonly.ForthatreasonSpeexcanachievemuchbetterresultsthanVorbisonspeech,typically2-4timeshighercompressionatequalquality.

Isn’tthereaGPLimplementationoftheGSM-FRcodec?WhyisSpeexnecessary?

Firstofall,it’snotclearwhetherGSM-FRiscoveredbyaPhilipspatent(seehttp://kbs.cs.tu-berlin.de/~jutta/toast.html).Also,GSM-FRoffersmediocrequalityatarelativelyhighbit-rate,whileSpeexcanofferequivalentqualityatalmosthalfthebit-rate.Lastbutnotleast,Speexoffersawiderangeofbit-ratesandsamplingrates,whileGSM-FRislimitedto8kHzspeechat13kbps.

UnderwhatlicenseisSpeexreleased?

Asofversion1.0beta1,SpeexisreleasedunderXiph’sversionofthe(revised)BSDlicense(seeAppendixD).Thislicenseisthemostpermissiveoftheopen-sourceli-censes.

AmIallowedtouseSpeexincommercialsoftware?

Yes.Aslongasyoucomplywiththelicense.Thisbasicallymeansyouhavetokeepthecopyrightnoticeandyoucan’tuseournametopromoteyourproductwithoutauthorization.Formoredetails,seelicenseinAppendixD.

Ogg,Speex,Vorbis,what’sthedifference?

Oggisacontainerformatforholdingmultimediadata.VorbisisanaudiocodecthatusesOggtostoreitsbit-streamsasfiles,hencethenameOggVorbis.SpeexalsousestheOggformattostoreitsbit-streamsasfiles,sotechnicallytheywouldbe“OggSpeex”files(IprefertocallthemjustSpeexfiles).OnedifferencewithVorbishowever,isthatSpeexislesstiedwithOgg.Actually,ifwhatyoudoisVoiceofIP(VoIP),youdon’tneedOggatall.

What’stheextensionforSpeex?

Speexfileshavethe.spxextension.Note,howeverthattheSpeextools(speexenc,speexdec)donotrelyontheextensionatall,soanyextensionwillwork.

AFAQ27

CanIuseSpeexforcompressingmusic?

JustlikeVorbisisnotreallyadaptedtospeech,Speexisreallynotadaptedformusic.Inmostcases,you’llbebetterofwithVorbiswhenitcomestomusic.

IconvertedsomeMP3’stoSpeexandthequalityisbad.What’swrong?

ThisiscalledtranscodinganditwillalwaysresultinmuchpoorerqualitythantheoriginalMP3.Unlessyouhaveareallygood(size)reasontodoso,nevertranscodespeech.Thisisevenvalidforselftranscoding(tandeming),i.e.IfyoudecodeaSpeexfileandre-encodeitagainatthesamebit-rate,youwilllosequality.

DoesSpeexrunonWindows?

CompilationonWindowshasbeensupportedsinceversion0.8.0.Therearealsosev-eralfront-endsavailablefromthewebsite.

Whyisencodingsoslowcomparedtodecoding?

Formostkindsofcompression,encodingisinherentlyslowerthandecoding.IncaseofSpeex,encodingconsistsoffinding,foreachvectorof5to10samples,entrythatmatchesthebestwithinacodebookconsistingof16to256entries.theotherhand,atdecodingallthatneedstobedoneislookuptherightentryincodebookusingtheencodedindex.Sincealookupismuchfasterthanasearch,decoderworksmuchfasterthantheencoder.

thetheOnthethe

WhyisSpeexsoslowonmyiPaq(orinsertanyplatformwithoutanFPU)?

Well,theparenthesisprovidestheanswer:noFPU(floating-pointunit).TheSpeexcodemakesheavyuseoffloating-pointoperations.OndeviceswithnoFPU,allfloating-pointinstructionsneedtobeemulated.Thisisaverytimeconsumingop-eration.

I’mgettingunusualbackgroundnoise(hiss)whenusinglibspeexinmyapplication.HowdoIfixthat?

Oneofthecausescouldbescalingoftheinputspeech.Speexexpectssignalstohavea¦215(signedshort)dynamicrange.Ifthedynamicrangeofyoursignalsistoosmall(e.g.¦1:0),youwillsufferimportantquantizationnoise.Agoodtargetistohaveadynamicrangearound¦8000whichislargeenough,butsmallenoughtomakesurethere’snoclippingwhenconvertingbacktosignedshort.

AFAQ28

Igetverydistortedspeechwhenusinglibspeexinmyapplication.What’swrong?

Therearemanypossiblecausesforthat.Oneofthemiserrorsinthewaythebitsaremanipulated.Anotherpossiblecauseistheuseofthesameencoderordecoderstateformorethanoneaudiostream(channel),whichproducesstrangeeffectswiththefiltermemories.Iftheinputspeechhasanamplitudecloseto¦215,itispossiblethatatdecoding,theamplitudebeabithigherthanthat,causingclippingwhensavingas16-bitPCM.

HowdoesSpeexcomparetootherproprietarycodecs?

It’shardtogiveprecisefiguressincenoformallisteningtestshavebeenperformedyet.AllIcansayisthatintermsofquality,Speexcompetesonthesamegroundasotherproprietarycodecs(notnecessarilythebest,butnottheworsteither).Speexalsohasmanyfeaturesthatarenotpresentinmostothercodecs.Theseincludevariablebit-rate(VBR),integrationofnarrowbandandwideband,aswellasstereosupport.Ofcourse,anotherareawhereSpeexisreallyhardtobeatisthequality/priceratio.Unlikemanyveryexpensivecodecs,Speexisfreeandanyonemaydistribute/modifyitatwill.

CanSpeexpassDTMF?

Iguessitalldependsonthebit-rateused.Thoughnoformaltestinghasyetbeenper-formed,I’dsaydon’tgobelowthe15kbpsmodeifyouwantDTMFtobetransmittedcorrectly.DTMFat8kbpsmayworkbutyourmileagemayvary.Also,makesureyoudon’tusethelowestcomplexity(seeSPEEX_SET_COMPLEXITYor–compoption),asitcausessignificantnoise.

CanSpeexpassV.9xmodemsignalscorrectly?

IfIcoulddothatI’dbeveryrichbynow:-)

Whatisyour(Jean-Marc)relationshipwiththeUniversityofSher-brookeandhowdoesSpeexfitintothat?

Currently(2003/03/09),I’mdoingaPh.D.attheUniversityofSherbrookeinmo-bilerobotics.AlthoughIdidmymasterwiththeSherbrookespeechcodinggroup(inspeechenhancement,notcoding),Iamnotassociatedwiththemanymore.ItshouldnotbeunderstoodthattheyortheUniversityofSherbrookeendorsetheSpeexprojectinanyway.Furthermore,Speexdoesnotmakeuseofanycodeorproprietarytechnol-ogydevelopedintheSherbrookespeechcodinggroup.

CELP,ACELP,what’sthedifference?

CELPstandsfor“CodeExcitedLinearPrediction”,whileACELPstandsfor“Alge-braicCodeExcitedLinearPrediction”.ThatmeansACELPisaCELPtechniquethat

AFAQ29

usesanalgebraiccodebookrepresentedasasumofunitpulses,thusmakingthecode-booksearchmuchmoreefficient.ThistechniquewasinventedattheUniversityofSherbrookeandisnowoneofthemostwidelyusedformofCELP.Unfortunately,sinceitispatented,itcannotbeusedinSpeex.

BSAMPLECODE30

BSamplecode

ThissectionshowssamplecodeforencodinganddecodingspeechusingtheSpeexAPI.Thecommandscanbeusedtoencodeanddecodeafilebycalling:%sampleencin_file.sw|sampledecout_file.sw

wherebothfilesareraw(noheader)filesencodedat16bitspersample(inthemachinenaturalendianness).

B.1sampleenc.c

sampleenctakesaraw16bits/samplefile,encodesitandoutputsaSpeexstreamtostdout.NotethatthepackingusedisNOTcompatiblewiththatofspeexenc/speexdec.#include#include

/*Theframesizeinhardcodedforthissamplecodebutitdoesn’thavetobe*/#defineFRAME_SIZE160

intmain(intargc,char**argv){

char*inFile;FILE*fin;

shortin[FRAME_SIZE];floatinput[FRAME_SIZE];charcbits[200];intnbBytes;

/*Holdsthestateoftheencoder*/void*state;

/*HoldsbitssotheycanbereadandwrittentobytheSpeexroutines*/SpeexBitsbits;inti,tmp;

/*Createanewencoderstateinnarrowbandmode*/state=speex_encoder_init(&speex_nb_mode);/*Setthequalityto8(15kbps)*/tmp=8;

speex_encoder_ctl(state,SPEEX_SET_QUALITY,&tmp);inFile=argv[1];

fin=fopen(inFile,\"r\");

/*Initializationofthestructurethatholdsthebits*/speex_bits_init(&bits);while(1){

BSAMPLECODE31

/*Reada16bits/sampleaudioframe*/

fread(in,sizeof(short),FRAME_SIZE,fin);if(feof(fin))

break;

/*Copythe16bitsvaluestofloatsoSpeexcanworkonthem*/for(i=0;iinput[i]=in[i];/*Flushallthebitsinthestructsowecanencodeanewframe*/speex_bits_reset(&bits);

/*Encodetheframe*/

speex_encode(state,input,&bits);

/*Copythebitstoanarrayofcharthatcanbewritten*/nbBytes=speex_bits_write(&bits,cbits,200);

/*Writethesizeoftheframefirst.Thisiswhatsampledecexpectsbutit’slikelytobedifferentinyourownapplication*/fwrite(&nbBytes,sizeof(int),1,stdout);/*Writethecompresseddata*/

fwrite(cbits,1,nbBytes,stdout);}

/*Destroytheencoderstate*/speex_encoder_destroy(state);

/*Destroythebit-packingstruct*/speex_bits_destroy(&bits);fclose(fin);return0;}

B.2sampledec.c

sampledecreadsaSpeexstreamfromstdin,decodesitandoutputsittoaraw16bits/samplefile.NotethatthepackingusedisNOTcompatiblewiththatofspeex-enc/speexdec.

#include#include

/*Theframesizeinhardcodedforthissamplecodebutitdoesn’thavetobe*/#defineFRAME_SIZE160

intmain(intargc,char**argv){

char*outFile;

BSAMPLECODE32

FILE*fout;

/*Holdstheaudiothatwillbewrittentofile(16bitspersample)*/shortout[FRAME_SIZE];

/*Speexhandlesamplesasfloat,soweneedanarrayoffloats*/floatoutput[FRAME_SIZE];charcbits[200];intnbBytes;

/*Holdsthestateofthedecoder*/void*state;

/*HoldsbitssotheycanbereadandwrittentobytheSpeexroutines*/SpeexBitsbits;inti,tmp;

/*Createanewdecoderstateinnarrowbandmode*/state=speex_decoder_init(&speex_nb_mode);/*Settheperceptualenhancementon*/tmp=1;

speex_decoder_ctl(state,SPEEX_SET_ENH,&tmp);outFile=argv[1];

fout=fopen(outFile,\"w\");

/*Initializationofthestructurethatholdsthebits*/speex_bits_init(&bits);while(1){

/*Readthesizeencodedbysampleenc,thispartwilllikelybedifferentinyourapplication*/

fread(&nbBytes,sizeof(int),1,stdin);

fprintf(stderr,\"nbBytes:%d\\n\nbBytes);if(feof(stdin))

break;

/*Readthe\"packet\"encodedbysampleenc*/fread(cbits,1,nbBytes,stdin);

/*Copythedataintothebit-streamstruct*/speex_bits_read_from(&bits,cbits,nbBytes);/*Decodethedata*/

speex_decode(state,&bits,output);

/*Copyfromfloattoshort(16bits)foroutput*/for(i=0;iout[i]=output[i];

BSAMPLECODE

/*Writethedecodedaudiotofile*/

fwrite(out,sizeof(short),FRAME_SIZE,fout);}

/*Destroythedecoderstate*/speex_encoder_destroy(state);/*Destroythebit-streamtruct*/speex_bits_destroy(&bits);fclose(fout);return0;

33

}

CIETFRTPPROFILE34

CIETFRTPProfile

InternetEngineeringTaskForceInternetDraft

draft-herlein-avt-rtp-speex-00.txtMarch3,2004

Expires:September3,2004GregHerleinJean-MarcValin

SimonMorlatRogerHardiman

PhilKerr

RTPPayloadFormatfortheSpeexCodec

StatusofthisMemo

ThisdocumentisanInternet-DraftandisinfullconformancewithallprovisionsofSection10ofRFC2026.

Internet-DraftsareworkingdocumentsoftheInternetEngineeringTaskForce(IETF),itsareas,anditsworkinggroups.NotethatothergroupsmayalsodistributeworkingdocumentsasInternet-Drafts.

Internet-Draftsaredraftdocumentsvalidforamaximumofsixmonthsandmaybeupdated,replaced,orobsoletedbyother

documentsatanytime.ItisinappropriatetouseInternet-Draftsasreferencematerialortocitethemotherthanas\"workinprogress\".

ThelistofcurrentInternet-Draftscanbeaccessedathttp://www.ietf.org/ietf/1id-abstracts.txt

ToviewthelistInternet-DraftShadowDirectories,seehttp://www.ietf.org/shadow.html.

CopyrightNotice

Copyright(C)TheInternetSociety(2003).

AllRightsReserved.

Abstract

Speexisanopen-sourcevoicecodecsuitableforuseinVoiceover

CIETFRTPPROFILE35

IP(VoIP)typeapplications.ThisdocumentdescribesthepayloadformatforSpeexgeneratedbitstreamswithinanRTPpacket.AlsoincludedherearethenecessarydetailsfortheuseofSpeexwiththeSessionDescriptionProtocol(SDP)andapreliminarymethodofusingSpeexwithinH.323applications.

1.Conventionsusedinthisdocument

Thekeywords\"MUST\\"MUSTNOT\\"REQUIRED\\"SHALL\\"SHALLNOT\\"SHOULD\\"SHOULDNOT\\"RECOMMENDED\\"MAY\and\"OPTIONAL\"inthisdocumentaretobeinterpretedasdescribedinRFC2119[5].Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page1]

March3,2004

2.OverviewoftheSpeexCodec

SpeexisbasedontheCELP[12]encodingtechniquewithsupportfor

eithernarrowband(nominal8kHz),wideband(nominal16kHz)or

ultra-wideband(nominal32kHz),and(non-optimal)ratesupto48kHzsamplingalsoavailable.Themaincharacteristicscanbesummarizedasfollows:oooooo

Freesoftware/open-source

Integrationofwidebandandnarrowbandinthesamebit-streamWiderangeofbit-ratesavailable

Dynamicbit-rateswitchingandvariablebit-rate(VBR)VoiceActivityDetection(VAD,integratedwithVBR)Variablecomplexity

3.RTPpayloadformatforSpeex

ForRTPbasedtransportationofSpeexencodedaudiothestandardRTPheader[2]isfollowedbyoneormorepayloaddatablocks.Anoptionalpaddingterminatormayalsobeused.

012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|RTPHeader|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|oneormoreframesofSpeex....|

CIETFRTPPROFILE36

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|oneormoreframesofSpeex....|padding|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.1RTPHeader

012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TheRTPheaderbeginswithanoctetoffields(V,P,X,andCC)tosupportspecializedRTPuses(see[8]and[9]fordetails).ForSpeexthefollowingvaluesareused.

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page2]

March3,2004

Version(V):2bits

ThisfieldidentifiestheversionofRTP.Theversionusedbythisspecificationistwo(2).Padding(P):1bit

Ifthepaddingbitisset,thepacketcontainsoneormoreadditionalpaddingoctetsattheendwhicharenotpartofthepayload.PissetifthetotalpacketsizeislessthantheMTU.Extension(X):1bit

Iftheextension,X,bitisset,thefixedheaderMUSTbe

followedbyexactlyoneheaderextension,withaformatdefinedinSection5.3.1.of[8],CSRCcount(CC):4bits

CIETFRTPPROFILE

TheCSRCcountcontainsthenumberofCSRCidentifiers.

37

Marker(M):1bit

TheMbitindicatesifthepacketcontainscomfortnoise.ThisfieldisusedinconjunctionwiththecngSDPattributeandisdetailedfurtherinsection5below.Innormalusagethisbitissetifthepacketcontainscomfortnoise.PayloadType(PT):7bits

AnRTPprofileforaclassofapplicationsisexpectedtoassignapayloadtypeforthisformat,oradynamicallyallocatedpayloadtypeSHOULDbechosenwhichdesignatesthepayloadasSpeex.Sequencenumber:16bits

ThesequencenumberincrementsbyoneforeachRTPdatapacketsent,andmaybeusedbythereceivertodetectpacketlossandtorestorepacketsequence.Thisfieldisdetailedfurtherin[2].Timestamp:32bits

AtimestamprepresentingthesamplingtimeofthefirstsampleofthefirstSpeexpacketintheRTPpacket.TheclockfrequencyMUSTbesettothesamplerateoftheencodedaudiodata.

Speexuses20msecframesandavariablesamplingrateclock.TheRTPtimestampMUSTbeinunitsof1/XofasecondwhereX

isthesamplerateused.Speexusesanominal8kHzsamplingratefornarrowbanduse,anominal16kHzsamplingrateforwidebanduse,andanominal32kHzsamplingrateforultra-widebanduse.SSRC/CSRCidentifiers:

Thesetwofields,32bitseachwithoneSSRCfieldandamaximumof16CSRCfields,areasdefinedin[2].

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page3]

March3,2004

3.2Speexpayload

ForthepurposesofpacketizingthebitstreaminRTP,itisonly

necessarytoconsiderthesequenceofbitsasoutputbytheSpeex

CIETFRTPPROFILE

encoder[11],andpresentthesamesequencetothedecoder.payloadformatdescribedheremaintainsthissequence.

38The

AtypicalSpeexframe,encodedatthemaximumbitrate,isapprox.110octetsandthetotalnumberofSpeexframesSHOULDbekept

lessthanthepathMTUtopreventfragmentation.SpeexframesMUSTNOTbefragmentedacrossmultipleRTPpackets,

AnRTPpacketMAYcontainSpeexframesofthesamebitrateorofvaryingbitrates,sincethebit-rateforaframeisconveyedinbandwiththesignal.

Theencodinganddecodingalgorithmcanchangethebitrateatany

20msecframeboundary,withthebitratechangenotificationprovidedin-bandwiththebitstream.Eachframecontainsboth\"mode\"

(narrowband,widebandorultra-wideband)and\"sub-mode\"(bit-rate)informationinthebitstream.Noout-of-bandnotificationisrequiredforthedecodertoprocesschangesinthebitratesentbytheencoder.

ItisRECOMMENDEDthatvaluesof8000,16000and32000beusedfornormalinternettelephonyapplications,thoughthesamplerateissupportedatratesaslowas6000Hzandashighas48kHz.

TheRTPpayloadMUSTbepaddedtoprovideanintegernumberofoctetsasthepayloadlength.ThesepaddingbitsareLSBalignedinnetworkbyteorderandconsistofa0followedbyallones(untiltheendoftheoctet).Thispaddingisonlyrequiredforthelastframeinthepacket,andonlytoensurethepacketcontentsendsonanoctetboundary.

3.2.1ExampleSpeexpacket

IntheexamplebelowwehaveasingleSpeexframewith5bitsofpaddingtoensurethepacketsizefallsonanoctetboundary.012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier|

CIETFRTPPROFILE39

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page4]

March3,2004

0123

01234567890123456789012345678901+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|01111|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.4MultipleSpeexframesinaRTPpacket

BelowisanexampleoftwoSpeexframescontainedwithinoneRTPpacket.TheSpeexframelengthinthisexamplefallonanoctetboundarysothereisnopadding.

Speexcodecs[11]areabletodetectthethebitratefromthepayloadandareresponsiblefordetectingthe20msecboundariesbetweeneachframe.

012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

CIETFRTPPROFILE40

|..speexdata..|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

4.MIMEregistrationofSpeex

FulldefinitionoftheMIMEtypeforSpeexwillbepartoftheOggVorbisMIMEtypedefinitionapplication[10].MIMEmediatypename:audioMIMEsubtype:speex

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page5]

March3,2004

Optionalparameters:

Requiredparameters:tobeincludedintheOggMIMEspecification.Encodingconsiderations:

SecurityConsiderations:

SeeSection6ofRFC3047.Interoperabilityconsiderations:nonePublishedspecification:

Applicationswhichusethismediatype:Additionalinformation:none

Person&emailaddresstocontactforfurtherinformation:GregHerlein

Jean-MarcValinIntendedusage:COMMON

Author/Changecontroller:

Author:GregHerlein

Changecontroller:GregHerlein

CIETFRTPPROFILE41

ThistransporttypesignifiesthatthecontentistobeinterpretedaccordingtothisdocumentifthecontentsaretransmittedoverRTP.ShouldthistransporttypeappearoveralosslessstreamingprotocolsuchasTCP,thecontentencapsulationshouldbeinterpretedasanOggStreaminaccordancewithRFC3534,withtheexceptionthatthecontentoftheOggStreammaybeassumedtobeSpeexaudioandSpeexaudioonly.

5.SDPusageofSpeex

WhenconveyinginformationbySDP[4],theencodingnameMUSTbesetto\"speex\".AnexampleofthemediarepresentationinSDPforofferingasinglechannelofSpeexat8000samplespersecondmightbe:

m=audio8088RTP/AVP97a=rtpmap:97speex/8000

NotethattheRTPpayloadtypecodeof97isdefinedinthismediadefinitiontobe’mapped’tothespeexcodecatan8kHzsamplingfrequencyusingthe’a=rtpmap’line.Anynumberfrom96to127couldhavebeenchosen(theallowedrangefordynamictypes).

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page6]

March3,2004

Thevalueofthesamplingfrequencyistypically8000fornarrowband

operation,16000forwidebandoperation,and32000forultra-widebandoperation.

Ifforsomereasontheoffererhasbandwidthlimitations,theclientmayusethe\"b=\"header,asexplainedinSDP[4].Thefollowingexampleillustratesthecasewheretheofferercannotreceivemorethan10kbit/s.

m=audio8088RTP/AVP97b=AS:10

a=rtmap:97speex/8000

Inthiscase,iftheremotepartagrees,itshouldconfigureits

CIETFRTPPROFILE42

Speexencodersothatitdoesnotusemodesthatproducemorethan10kbit/s.Notethatthe\"b=\"constraintalsoappliesonallpayloadtypesthatmaybeproposedinthemedialine(\"m=\").AnotherwaytomakerecommendationstotheremoteSpeexencoderistouseitsspecificparametersviathea=fmtp:directive.Thefollowingparametersaredefinedforuseinthisway:

ptime:durationofeachpacketinmilliseconds.

sr:ebw:

actualsamplerateinHz.

encodingbandwidth-either’narrow’or’wide’or

’ultra’(correspondstonominal8000,16000,and

32000Hzsamplingrates).

vbr:variablebitrate-either’on’’off’or’vad’(defaultstooff).Ifon,variablebitrateisenabled.Ifoff,disabled.Ifsetto’vad’then

constantbitrateisusedbutsilencewillbeencodedwithspecialshortframestoindicatealackofvoiceforthatperiod.

cng:comfortnoisegeneration-either’on’or’off’.Ifoffthensilenceframeswillbesilent;if’on’thenthoseframeswillbefilledwithcomfortnoise.mode:

Speexencodingmode.Canbe{1,2,3,4,5,6,any}

defaultsto3innarrowband,6inwideandultra-wide.

penh:useofperceptualenhancement.1indicates

tothedecoderthatperceptualenhancementisrecommended,0indicatesthatitisnot.Defaultstoon(1).

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page7]

March3,2004

Examples:

CIETFRTPPROFILE43

m=audio8008RTP/AVP97a=rtpmap:97speex/8000a=fmtp:97mode=4

ThisexamplesillustrateanoffererthatwishestoreceiveaSpeexstreamat8000Hz,butonlyusingspeexmode3.Theofferermaysuggesttotheremotedecodertoactivateitsperceptualenhancementfilterlikethis:m=audio8088RTP/AVP97a=rtmap:97speex/8000a=fmtp:97penh=1

SeveralSpeexspecificparameterscanbegiveninasinglea=fmtplineprovidedthattheyareseparatedbyasemi-colon:a=fmtp:97mode=any;penh=1

Theofferermayindicatethatitwishestosendvariablebitrateframeswithcomfortnoise:m=audio8088RTP/AVP97a=rtmap:97speex/8000a=fmtp:97vbr=on;cng=on

The\"ptime\"attributeisusedtodenotethepacketizationinterval(ie,howmanymillisecondsofaudioisencodedina

singleRTPpacket).SinceSpeexuses20msecframes,ptimevaluesofmultiplesof20denotemultipleSpeexframesperpacket.Valuesofptimewhicharenotmultiplesof20MUSTbeignoredandclientsMUSTusethedefaultvalueof20instead.

Intheexamplebelowtheptimevalueissetto40,indicatingthatthereare2framesineachpacket.m=audio8008RTP/AVP97a=rtpmap:97speex/8000a=ptime:40

Notethattheptimeparameterappliestoallpayloadslisted

inthemedialineandisnotusedaspartofana=fmtpdirective.Valuesofptimenotmultipleof20msecaremeaningless,sothereceiverofsuchptimevaluesMUSTignorethem.IfduringthelifeofanRTPsessiontheptimevaluechanges,whenthereare

CIETFRTPPROFILE44

multipleSpeexframesforexample,theSDPvaluemustalsoreflectthenewvalue.

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page8]

March3,2004

Caremustbetakenwhensettingthevalueofptimesothatthe

RTPpacketsizedoesnotexceedthepathMTU.

6.ITUH.323/H.245UseofSpeex

ApplicationisunderwaytomakeSpeexastandardITUcodec.

However,untilthatisfinalized,SpeexMAYbeusedinH.323[6]byusinganon-standardcodecblockdefinitionintheH.245[7]codeccapabilitynegotiations.

6.1NonStandardMessageformat

ForSpeexuseinH.245[7]basedsystems,thefieldsintheNonStandardMessageshouldbe:

t35CountryCode=Hex:B5t35Extension=Hex:00manufacturerCode=Hex:0026

[LengthoftheBinarySequence(8bitnumber)]

[BinarySequenceconsistingofanASCIIstring,noNULLterminator]Thebinarysequenceisanasciistringmerelyforeaseofuse.Thestringisnotnullterminated.Theformatofthisstringis

speex[optionalvariables]

TheoptionalvariablesareidenticaltothoseusedfortheSDPa=fmtpstringsdiscussedinsection5above.Thestringisbuilttobeallononeline,eachkey-valuepairseparatedbya

semi-colon.TheoptionalvariablesMAYbeomitted,whichcausesthedefaultvaluestobeassumed.Theyare:

ebw=narrow;mode=3;vbr=off;cng=off;ptime=20;sr=8000;penh=no;

CIETFRTPPROFILE45

Thefifthbyteoftheblockisthelengthofthebinarysequence.NOTE:thismethodcanresultintheadvertisingofalargenumberofSpeex’codecs’basedonthenumberofvariablespossible.FormostVoIPapplications,useofthedefaultbinarysequenceof

’speex’isRECOMMENDEDtobeusedinadditiontoallotheroptions.ThismaximizesthechancesthattwoH.323basedapplicationsthatsupportSpeexcanfindamutualcodec.

6.2RTPPayloadTypes

DynamicpayloadtypecodesMUSTbenegotiated’out-of-band’fortheassignmentofadynamicpayloadtypefromtherangeof96-127.H.323applicationsMUSTusetheH.245H2250LogicalChannelParametersencodingtoaccomplishthis.Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page9]

March3,2004

7.SecurityConsiderations

RTPpacketsusingthepayloadformatdefinedinthisspecification

aresubjecttothesecurityconsiderationsdiscussedintheRTPspecification[2],andanyappropriateRTPprofile.Thisimpliesthatconfidentialityofthemediastreamsisachievedbyencryption.Becausethedatacompressionusedwiththispayloadformatisappliedend-to-end,encryptionmaybeperformedaftercompressionsothereisnoconflictbetweenthetwooperations.

Apotentialdenial-of-servicethreatexistsfordataencodingsusingcompressiontechniquesthathavenon-uniformreceiver-end

computationalload.Theattackercaninjectpathologicaldatagramsintothestreamwhicharecomplextodecodeandcausethereceivertobeoverloaded.However,thisencodingdoesnotexhibitanysignificantnon-uniformity.

AswithanyIP-basedprotocol,insomecircumstancesareceivermaybeoverloadedsimplybythereceiptoftoomanypackets,eitherdesiredorundesired.Network-layerauthenticationmaybeusedtodiscardpacketsfromundesiredsources,buttheprocessingcostoftheauthenticationitselfmaybetoohigh.

CIETFRTPPROFILE46

8.NormativeReferences

1.

Bradner,S.,\"TheInternetStandardsProcess--Revision3\BCP9,RFC2026,October1996.

Schulzrinne,H.,Casner,S.,Frederick,R.andV.Jacobson,\"RTP:ATransportProtocolforreal-timeapplications\RFC1889,January1996.

Freed,N.andN.Borenstein,\"MultipurposeInternetMail

Extensions(MIME)PartOne:FormatofInternetMessageBodies\RFC2045,November1996.

Handley,M.andV.Jacobson,\"SDP:SessionDescriptionProtocol\RFC2327,April1998.

Bradner,S.,\"KeywordsforuseinRFCstoIndicateRequirementLevels\BCP14,RFC2119,March1997.

ITU-TRecommendationH.323.\"Packet-basedMultimediaCommunicationsSystems,\"1998.

ITU-TRecommendationH.245(1998),\"ControlofcommunicationsbetweenVisualTelephoneSystemsandTerminalEquipment\".RTP:Atransportprotocolforreal-timeapplications.Workinprogress,draft-ietf-avt-rtp-new-12.txt.

[Page10]March3,2004

2.

3.

4.

5.

6.

7.

8.

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

9.

RTPProfileforAudioandVideoConferenceswithMinimal

Control.Workinprogress,draft-ietf-avt-profile-new-13.txt.

10.L.Walleij,\"Theapplication/oggMediaType\RFC3534,May

2003.

8.1InformativeReferences

11.Speexenc/speexdec,referencecommand-lineencoder/decoder,

Speexwebsite,http://www.speex.org/12.CELP,U.S.FederalStandard1016.

NationalTechnical

CIETFRTPPROFILE

InformationService(NTIS)website,http://www.ntis.gov/

47

9.Acknowledgments

TheauthorswouldliketothankEquivalencePtyLtdofAustraliafortheirassistanceinattemptingtostandardizetheuseofSpeexinH.323applications,andforimplementingSpeexintheiropensourceOpenH323stack.TheauthorswouldalsoliketothankBrianC.WilesofStreamCommforhisassistanceindevelopingtheproposedstandardforSpeexuseinH.323applications.

TheauthorswouldalsoliketothankthefollowingmembersoftheSpeexandAVTcommunitiesfortheirinput:RossFinlayson,

FedericoMontesinoPouzols,HenningSchulzrinne,MagnusWesterlund.

10.Author’sAddress

GregHerlein2034FilbertStreetSanFrancisco,CAUnitedStates94123

Jean-MarcValinDepartmentofElectricalandComputerEngineeringUniversityofSherbrooke2500blvdUniversitüÃüüÃé

Sherbrooke,Quebec,Canada,J1K2R1

SimonMORLAT35,avdeVizilleApp4238000GRENOBLEFRANCE

Herlein,Valin,et.al.ExpiresSeptember3,2004^L

Internet-Draftdraft-herlein-avt-rtp-speex-00.txt

[Page11]March3,2004

CIETFRTPPROFILE

RogerHardiman49NettletonRoadCheltenham

GloucestershireGL516NREngland

48

PhilKerrCentreforMusicTechnologyUniversityofGlasgowGlasgowG128LTScotland

10.FullCopyrightStatement

Copyright(C)TheInternetSociety(2003).

AllRightsReserved.

Thisdocumentandtranslationsofitmaybecopiedandfurnishedtoothers,andderivativeworksthatcommentonorotherwiseexplainitorassistinitsimplementationmaybeprepared,copied,publishedanddistributed,inwholeorinpart,withoutrestrictionofany

kind,providedthattheabovecopyrightnoticeandthisparagraphareincludedonallsuchcopiesandderivativeworks.However,thisdocumentitselfmaynotbemodifiedinanyway,suchasbyremovingthecopyrightnoticeorreferencestotheInternetSocietyorotherInternetorganizations,exceptasneededforthepurposeofdevelopingInternetstandardsinwhichcasetheproceduresforcopyrightsdefinedintheInternetStandardsprocessmustbe

followed,orasrequiredtotranslateitintolanguagesotherthanEnglish.

ThelimitedpermissionsgrantedaboveareperpetualandwillnotberevokedbytheInternetSocietyoritssuccessorsorassigns.Thisdocumentandtheinformationcontainedhereinisprovidedonan\"ASIS\"basisandTHEINTERNETSOCIETYANDTHEINTERNETENGINEERINGTASKFORCEDISCLAIMSALLWARRANTIES,EXPRESSORIMPLIED,INCLUDINGBUTNOTLIMITEDTOANYWARRANTYTHATTHEUSEOFTHEINFORMATIONHEREINWILLNOTINFRINGEANYRIGHTSORANYIMPLIEDWARRANTIESOFMERCHANTABILITYORFITNESSFORAPARTICULARPURPOSE.

Acknowledgement

CIETFRTPPROFILE49

FundingfortheRFCEditorfunctioniscurrentlyprovidedbytheInternetSociety.

Herlein,Valin,et.al.^L

ExpiresSeptember3,2004[Page12]

DSPEEXLICENSE50

DSpeexLicense

Redistributionanduseinsourceandbinaryforms,withorwithoutmodification,arepermittedprovidedthatthefollowingconditionsaremet:

󰀏Redistributionsofsourcecodemustretaintheabovecopyrightnotice,thislistofconditionsandthefollowingdisclaimer.

󰀏Redistributionsinbinaryformmustreproducetheabovecopyrightnotice,thislistofconditionsandthefollowingdisclaimerinthedocumentationand/orothermaterialsprovidedwiththedistribution.

󰀏NeitherthenameoftheXiph.orgFoundationnorthenamesofitscontributorsmaybeusedtoendorseorpromoteproductsderivedfromthissoftwarewithoutspecificpriorwrittenpermission.

THISSOFTWAREISPROVIDEDBYTHECOPYRIGHTHOLDERSANDCON-TRIBUTORS“ASIS”ANDANYEXPRESSORIMPLIEDWARRANTIES,IN-CLUDING,BUTNOTLIMITEDTO,THEIMPLIEDWARRANTIESOFMER-CHANTABILITYANDFITNESSFORAPARTICULARPURPOSEAREDISCLAIMED.INNOEVENTSHALLTHEFOUNDATIONORCONTRIBUTORSBELIABLEFORANYDIRECT,INDIRECT,INCIDENTAL,SPECIAL,EXEMPLARY,ORCON-SEQUENTIALDAMAGES(INCLUDING,BUTNOTLIMITEDTO,PROCURE-MENTOFSUBSTITUTEGOODSORSERVICES;LOSSOFUSE,DATA,ORPROF-ITS;ORBUSINESSINTERRUPTION)HOWEVERCAUSEDANDONANYTHE-ORYOFLIABILITY,WHETHERINCONTRACT,STRICTLIABILITY,ORTORT(INCLUDINGNEGLIGENCEOROTHERWISE)ARISINGINANYWAYOUTOFTHEUSEOFTHISSOFTWARE,EVENIFADVISEDOFTHEPOSSIBILITYOFSUCHDAMAGE.

EGNUFREEDOCUMENTATIONLICENSE51

EGNUFreeDocumentationLicense

Version1.1,March2000

Copyright(C)2000FreeSoftwareFoundation,Inc.59TemplePlace,Suite330,Boston,MA02111-1307USAEveryoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,butchangingitisnotallowed.

0.PREAMBLE

ThepurposeofthisLicenseistomakeamanual,textbook,orotherwrittendocument\"free\"inthesenseoffreedom:toassureeveryonetheeffectivefreedomtocopyandredistributeit,withorwithoutmodifyingit,eithercommerciallyornoncommercially.Secondarily,thisLicensepreservesfortheauthorandpublisherawaytogetcreditfortheirwork,whilenotbeingconsideredresponsibleformodificationsmadebyothers.ThisLicenseisakindof\"copyleft\whichmeansthatderivativeworksofthedocumentmustthemselvesbefreeinthesamesense.ItcomplementstheGNUGeneralPublicLicense,whichisacopyleftlicensedesignedforfreesoftware.

WehavedesignedthisLicenseinordertouseitformanualsforfreesoftware,becausefreesoftwareneedsfreedocumentation:afreeprogramshouldcomewithmanualsprovidingthesamefreedomsthatthesoftwaredoes.ButthisLicenseisnotlimitedtosoftwaremanuals;itcanbeusedforanytextualwork,regardlessofsub-jectmatterorwhetheritispublishedasaprintedbook.WerecommendthisLicenseprincipallyforworkswhosepurposeisinstructionorreference.

1.APPLICABILITYANDDEFINITIONS

ThisLicenseappliestoanymanualorotherworkthatcontainsanoticeplacedbythecopyrightholdersayingitcanbedistributedunderthetermsofthisLicense.The\"Document\below,referstoanysuchmanualorwork.Anymemberofthepublicisalicensee,andisaddressedas\"you\".

A\"ModifiedVersion\"oftheDocumentmeansanyworkcontainingtheDocumentoraportionofit,eithercopiedverbatim,orwithmodificationsand/ortranslatedintoanotherlanguage.

A\"SecondarySection\"isanamedappendixorafront-mattersectionoftheDoc-umentthatdealsexclusivelywiththerelationshipofthepublishersorauthorsoftheDocumenttotheDocument’soverallsubject(ortorelatedmatters)andcontainsnoth-ingthatcouldfalldirectlywithinthatoverallsubject.(Forexample,iftheDocumentisinpartatextbookofmathematics,aSecondarySectionmaynotexplainanymathe-matics.)Therelationshipcouldbeamatterofhistoricalconnectionwiththesubjectorwithrelatedmatters,oroflegal,commercial,philosophical,ethicalorpoliticalpositionregardingthem.

The\"InvariantSections\"arecertainSecondarySectionswhosetitlesaredesig-nated,asbeingthoseofInvariantSections,inthenoticethatsaysthattheDocumentisreleasedunderthisLicense.

The\"CoverTexts\"arecertainshortpassagesoftextthatarelisted,asFront-CoverTextsorBack-CoverTexts,inthenoticethatsaysthattheDocumentisreleasedunder

EGNUFREEDOCUMENTATIONLICENSE52

thisLicense.

A\"Transparent\"copyoftheDocumentmeansamachine-readablecopy,repre-sentedinaformatwhosespecificationisavailabletothegeneralpublic,whosecon-tentscanbeviewedandediteddirectlyandstraightforwardlywithgenerictexteditorsor(forimagescomposedofpixels)genericpaintprogramsor(fordrawings)somewidelyavailabledrawingeditor,andthatissuitableforinputtotextformattersorforautomatictranslationtoavarietyofformatssuitableforinputtotextformatters.AcopymadeinanotherwiseTransparentfileformatwhosemarkuphasbeendesignedtothwartordiscouragesubsequentmodificationbyreadersisnotTransparent.Acopythatisnot\"Transparent\"iscalled\"Opaque\".

ExamplesofsuitableformatsforTransparentcopiesincludeplainASCIIwithout

Amarkup,Texinfoinputformat,LTEXinputformat,SGMLorXMLusingapublicly

availableDTD,andstandard-conformingsimpleHTMLdesignedforhumanmodifi-cation.OpaqueformatsincludePostScript,PDF,proprietaryformatsthatcanbereadandeditedonlybyproprietarywordprocessors,SGMLorXMLforwhichtheDTDand/orprocessingtoolsarenotgenerallyavailable,andthemachine-generatedHTMLproducedbysomewordprocessorsforoutputpurposesonly.

The\"TitlePage\"means,foraprintedbook,thetitlepageitself,plussuchfollowingpagesasareneededtohold,legibly,thematerialthisLicenserequirestoappearinthetitlepage.Forworksinformatswhichdonothaveanytitlepageassuch,\"TitlePage\"meansthetextnearthemostprominentappearanceofthework’stitle,precedingthebeginningofthebodyofthetext.

2.VERBATIMCOPYING

YoumaycopyanddistributetheDocumentinanymedium,eithercommerciallyornoncommercially,providedthatthisLicense,thecopyrightnotices,andthelicensenoticesayingthisLicenseappliestotheDocumentarereproducedinallcopies,andthatyouaddnootherconditionswhatsoevertothoseofthisLicense.Youmaynotusetechnicalmeasurestoobstructorcontrolthereadingorfurthercopyingofthecopiesyoumakeordistribute.However,youmayacceptcompensationinexchangeforcopies.Ifyoudistributealargeenoughnumberofcopiesyoumustalsofollowtheconditionsinsection3.

Youmayalsolendcopies,underthesameconditionsstatedabove,andyoumaypubliclydisplaycopies.

3.COPYINGINQUANTITY

IfyoupublishprintedcopiesoftheDocumentnumberingmorethan100,andtheDoc-ument’slicensenoticerequiresCoverTexts,youmustenclosethecopiesincoversthatcarry,clearlyandlegibly,alltheseCoverTexts:Front-CoverTextsonthefrontcover,andBack-CoverTextsonthebackcover.Bothcoversmustalsoclearlyandlegiblyidentifyyouasthepublisherofthesecopies.Thefrontcovermustpresentthefulltitlewithallwordsofthetitleequallyprominentandvisible.Youmayaddothermate-rialonthecoversinaddition.Copyingwithchangeslimitedtothecovers,aslongas

EGNUFREEDOCUMENTATIONLICENSE53

theypreservethetitleoftheDocumentandsatisfytheseconditions,canbetreatedasverbatimcopyinginotherrespects.

Iftherequiredtextsforeithercoveraretoovoluminoustofitlegibly,youshouldputthefirstoneslisted(asmanyasfitreasonably)ontheactualcover,andcontinuetherestontoadjacentpages.

IfyoupublishordistributeOpaquecopiesoftheDocumentnumberingmorethan100,youmusteitherincludeamachine-readableTransparentcopyalongwitheachOpaquecopy,orstateinorwitheachOpaquecopyapublicly-accessiblecomputer-networklocationcontainingacompleteTransparentcopyoftheDocument,freeofaddedmaterial,whichthegeneralnetwork-usingpublichasaccesstodownloadanony-mouslyatnochargeusingpublic-standardnetworkprotocols.Ifyouusethelatterop-tion,youmusttakereasonablyprudentsteps,whenyoubegindistributionofOpaquecopiesinquantity,toensurethatthisTransparentcopywillremainthusaccessibleatthestatedlocationuntilatleastoneyearafterthelasttimeyoudistributeanOpaquecopy(directlyorthroughyouragentsorretailers)ofthateditiontothepublic.

Itisrequested,butnotrequired,thatyoucontacttheauthorsoftheDocumentwellbeforeredistributinganylargenumberofcopies,togivethemachancetoprovideyouwithanupdatedversionoftheDocument.

4.MODIFICATIONS

YoumaycopyanddistributeaModifiedVersionoftheDocumentundertheconditionsofsections2and3above,providedthatyoureleasetheModifiedVersionunderpre-ciselythisLicense,withtheModifiedVersionfillingtheroleoftheDocument,thuslicensingdistributionandmodificationoftheModifiedVersiontowhoeverpossessesacopyofit.Inaddition,youmustdothesethingsintheModifiedVersion:

󰀏A.UseintheTitlePage(andonthecovers,ifany)atitledistinctfromthatoftheDocument,andfromthoseofpreviousversions(whichshould,iftherewereany,belistedintheHistorysectionoftheDocument).Youmayusethesametitleasapreviousversioniftheoriginalpublisherofthatversiongivespermission.󰀏B.ListontheTitlePage,asauthors,oneormorepersonsorentitiesresponsibleforauthorshipofthemodificationsintheModifiedVersion,togetherwithatleastfiveoftheprincipalauthorsoftheDocument(allofitsprincipalauthors,ifithaslessthanfive).

󰀏C.StateontheTitlepagethenameofthepublisheroftheModifiedVersion,asthepublisher.

󰀏D.PreserveallthecopyrightnoticesoftheDocument.

󰀏E.Addanappropriatecopyrightnoticeforyourmodificationsadjacenttotheothercopyrightnotices.

󰀏F.Include,immediatelyafterthecopyrightnotices,alicensenoticegivingthepublicpermissiontousetheModifiedVersionunderthetermsofthisLicense,intheformshownintheAddendumbelow.

EGNUFREEDOCUMENTATIONLICENSE54

󰀏G.PreserveinthatlicensenoticethefulllistsofInvariantSectionsandrequiredCoverTextsgivenintheDocument’slicensenotice.󰀏H.IncludeanunalteredcopyofthisLicense.

󰀏I.Preservethesectionentitled\"History\anditstitle,andaddtoitanitemstatingatleastthetitle,year,newauthors,andpublisheroftheModifiedVersionasgivenontheTitlePage.Ifthereisnosectionentitled\"History\"intheDocument,createonestatingthetitle,year,authors,andpublisheroftheDocumentasgivenonitsTitlePage,thenaddanitemdescribingtheModifiedVersionasstatedintheprevioussentence.

󰀏J.Preservethenetworklocation,ifany,givenintheDocumentforpublicaccesstoaTransparentcopyoftheDocument,andlikewisethenetworklocationsgivenintheDocumentforpreviousversionsitwasbasedon.Thesemaybeplacedinthe\"History\"section.Youmayomitanetworklocationforaworkthatwaspub-lishedatleastfouryearsbeforetheDocumentitself,oriftheoriginalpublisheroftheversionitreferstogivespermission.

󰀏K.Inanysectionentitled\"Acknowledgements\"or\"Dedications\preservethesection’stitle,andpreserveinthesectionallthesubstanceandtoneofeachofthecontributoracknowledgementsand/ordedicationsgiventherein.

󰀏L.PreservealltheInvariantSectionsoftheDocument,unalteredintheirtextandintheirtitles.Sectionnumbersortheequivalentarenotconsideredpartofthesectiontitles.

󰀏M.Deleteanysectionentitled\"Endorsements\".Suchasectionmaynotbein-cludedintheModifiedVersion.

󰀏N.Donotretitleanyexistingsectionas\"Endorsements\"ortoconflictintitlewithanyInvariantSection.

IftheModifiedVersionincludesnewfront-mattersectionsorappendicesthatqualifyasSecondarySectionsandcontainnomaterialcopiedfromtheDocument,youmayatyouroptiondesignatesomeorallofthesesectionsasinvariant.Todothis,addtheirtitlestothelistofInvariantSectionsintheModifiedVersion’slicensenotice.Thesetitlesmustbedistinctfromanyothersectiontitles.

Youmayaddasectionentitled\"Endorsements\provideditcontainsnothingbutendorsementsofyourModifiedVersionbyvariousparties–forexample,statementsofpeerrevieworthatthetexthasbeenapprovedbyanorganizationastheauthoritativedefinitionofastandard.

YoumayaddapassageofuptofivewordsasaFront-CoverText,andapassageofupto25wordsasaBack-CoverText,totheendofthelistofCoverTextsintheModifiedVersion.OnlyonepassageofFront-CoverTextandoneofBack-CoverTextmaybeaddedby(orthrougharrangementsmadeby)anyoneentity.IftheDocumentalreadyincludesacovertextforthesamecover,previouslyaddedbyyouorbyarrange-mentmadebythesameentityyouareactingonbehalfof,youmaynotaddanother;

EGNUFREEDOCUMENTATIONLICENSE55

butyoumayreplacetheoldone,onexplicitpermissionfromthepreviouspublisherthataddedtheoldone.

Theauthor(s)andpublisher(s)oftheDocumentdonotbythisLicensegiveper-missiontousetheirnamesforpublicityforortoassertorimplyendorsementofanyModifiedVersion.

5.COMBININGDOCUMENTS

YoumaycombinetheDocumentwithotherdocumentsreleasedunderthisLicense,underthetermsdefinedinsection4aboveformodifiedversions,providedthatyouincludeinthecombinationalloftheInvariantSectionsofalloftheoriginaldocuments,unmodified,andlistthemallasInvariantSectionsofyourcombinedworkinitslicensenotice.

ThecombinedworkneedonlycontainonecopyofthisLicense,andmultipleiden-ticalInvariantSectionsmaybereplacedwithasinglecopy.IftherearemultipleIn-variantSectionswiththesamenamebutdifferentcontents,makethetitleofeachsuchsectionuniquebyaddingattheendofit,inparentheses,thenameoftheoriginalau-thororpublisherofthatsectionifknown,orelseauniquenumber.MakethesameadjustmenttothesectiontitlesinthelistofInvariantSectionsinthelicensenoticeofthecombinedwork.

Inthecombination,youmustcombineanysectionsentitled\"History\"inthevari-ousoriginaldocuments,formingonesectionentitled\"History\";likewisecombineanysectionsentitled\"Acknowledgements\andanysectionsentitled\"Dedications\".Youmustdeleteallsectionsentitled\"Endorsements.\"

6.COLLECTIONSOFDOCUMENTS

YoumaymakeacollectionconsistingoftheDocumentandotherdocumentsreleasedunderthisLicense,andreplacetheindividualcopiesofthisLicenseinthevariousdocumentswithasinglecopythatisincludedinthecollection,providedthatyoufollowtherulesofthisLicenseforverbatimcopyingofeachofthedocumentsinallotherrespects.

Youmayextractasingledocumentfromsuchacollection,anddistributeitindivid-uallyunderthisLicense,providedyouinsertacopyofthisLicenseintotheextracteddocument,andfollowthisLicenseinallotherrespectsregardingverbatimcopyingofthatdocument.

7.AGGREGATIONWITHINDEPENDENTWORKS

AcompilationoftheDocumentoritsderivativeswithotherseparateandindependentdocumentsorworks,inoronavolumeofastorageordistributionmedium,doesnotasawholecountasaModifiedVersionoftheDocument,providednocompilationcopyrightisclaimedforthecompilation.Suchacompilationiscalledan\"aggregate\andthisLicensedoesnotapplytotheotherself-containedworksthuscompiledwiththeDocument,onaccountoftheirbeingthuscompiled,iftheyarenotthemselvesderivativeworksoftheDocument.

EGNUFREEDOCUMENTATIONLICENSE56

IftheCoverTextrequirementofsection3isapplicabletothesecopiesoftheDoc-ument,theniftheDocumentislessthanonequarteroftheentireaggregate,theDocu-ment’sCoverTextsmaybeplacedoncoversthatsurroundonlytheDocumentwithintheaggregate.Otherwisetheymustappearoncoversaroundthewholeaggregate.

8.TRANSLATION

Translationisconsideredakindofmodification,soyoumaydistributetranslationsoftheDocumentunderthetermsofsection4.ReplacingInvariantSectionswithtrans-lationsrequiresspecialpermissionfromtheircopyrightholders,butyoumayincludetranslationsofsomeorallInvariantSectionsinadditiontotheoriginalversionsoftheseInvariantSections.YoumayincludeatranslationofthisLicenseprovidedthatyoualsoincludetheoriginalEnglishversionofthisLicense.IncaseofadisagreementbetweenthetranslationandtheoriginalEnglishversionofthisLicense,theoriginalEnglishversionwillprevail.

9.TERMINATION

Youmaynotcopy,modify,sublicense,ordistributetheDocumentexceptasexpresslyprovidedforunderthisLicense.Anyotherattempttocopy,modify,sublicenseordistributetheDocumentisvoid,andwillautomaticallyterminateyourrightsunderthisLicense.However,partieswhohavereceivedcopies,orrights,fromyouunderthisLicensewillnothavetheirlicensesterminatedsolongassuchpartiesremaininfullcompliance.

10.FUTUREREVISIONSOFTHISLICENSE

TheFreeSoftwareFoundationmaypublishnew,revisedversionsoftheGNUFreeDocumentationLicensefromtimetotime.Suchnewversionswillbesimilarinspirittothepresentversion,butmaydifferindetailtoaddressnewproblemsorconcerns.Seehttp://www.gnu.org/copyleft/.

EachversionoftheLicenseisgivenadistinguishingversionnumber.IftheDocu-mentspecifiesthataparticularnumberedversionofthisLicense\"oranylaterversion\"appliestoit,youhavetheoptionoffollowingthetermsandconditionseitherofthatspecifiedversionorofanylaterversionthathasbeenpublished(notasadraft)bytheFreeSoftwareFoundation.IftheDocumentdoesnotspecifyaversionnumberofthisLicense,youmaychooseanyversioneverpublished(notasadraft)bytheFreeSoftwareFoundation.

Index

ACELP,28

algorithmicdelay,8

analysis-by-synthesis,19API,11

auto-correlation,18averagebit-rate,7,14bit-rate,23

CELP,6,17

complexity,6,7,22,23constantbit-rate,7

discontinuoustransmission,8,14DTMF,7,28errorweighting,19in-bandsignalling,15Levinson-Durbin,18libspeex,11

linespectralpair,19,21linearprediction,17,21meanopinionscore,22music,27

narrowband,6,7,21Ogg,16,26

open-source,6,26

patent,6,26

perceptualenhancement,8,13,23pitch,19,21

quadraturemirrorfilter,24quality,7RTP,16samplingrate,7speexdec,10speexenc,9

standards,16ultra-wideband,7

variablebit-rate,6,7,13

voiceactivitydetection,6,8,14Vorbis,26wideband,6,7,24

57

因篇幅问题不能全部显示,请点此查看更多更全内容