(version1.0.4)
Jean-MarcValin14thJuly2004
1
2
Copyright(c)2002-2004Jean-MarcValin.
Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.1oranylaterversionpub-lishedbytheFreeSoftwareFoundation;withnoInvariantSection,withnoFront-CoverTexts,andwithnoBack-Cover.Acopyofthelicenseisincludedinthesectionentitled\"GNUFreeDocumentationLicense\".
CONTENTSContents
1IntroductiontoSpeex2Featuredescription
3
Command-lineencoder/decoder
3.1speexenc.................................3.2speexdec.................................4
ProgrammingwithSpeex(thelibspeexAPI)4.1Encoding................................4.2Decoding................................4.3CodecOptions(speex_*_ctl)......................4.4Modequeries..............................4.5Packingandin-bandsignalling......
...............
5
Formatsandstandards
5.1RTPPayloadFormat..........................5.2MIMEType...............................5.3Oggfileformat.............................6
IntroductiontoCELPCoding
6.1LinearPrediction(LPC)........................6.2PitchPrediction.............................6.3InnovationCodebook..........................6.4Analysis-by-SynthesisandErrorWeighting..............7
Speexnarrowbandmode
7.1LPCAnalysis..............................7.2PitchPrediction(adaptivecodebook)..................7.3InnovationCodebook..........................7.4Bitallocation..............................7.5Perceptualenhancement......
..................8
Speexwidebandmode(sub-bandCELP)8.1LinearPrediction............................8.2PitchPrediction.............................8.3ExcitationQuantization.........................8.4Bitallocation.............
.................
AFAQ
BSamplecode
B.1sampleenc.c...............................B.2sampledec.c...............................
3
679910111112131415161616161717191919212121222223242424242426303031
CONTENTSCIETFRTPProfileDSpeexLicense
EGNUFreeDocumentationLicense
4345051
LISTOFTABLES5
ListofTables
12345
In-bandsignallingcodes............Ogg/Speexheaderpacket............Bitallocationfornarrowbandmodes......Qualityversusbit-rate..............Bitallocationforhigh-bandinwidebandmode
............................................................
1517222325
1INTRODUCTIONTOSPEEX6
1IntroductiontoSpeex
TheSpeexproject(http://www.speex.org/)hasbeenstartedbecausetherewasaneedforaspeechcodecthatwasopen-sourceandfreefromsoftwarepatents.Theseareessentialconditionsforbeingusedbyanyopen-sourcesoftware.ThereisalreadyVorbisthatdoesgeneralaudio,butitisnotreallysuitableforspeech.Also,unlikemanyotherspeechcodecs,Speexisnottargetedatcellphones(notmanyopen-sourcecellphonesanyway:-))butratheratvoiceoverIP(VoIP)andfile-basedcompression.Asdesigngoals,wewantedtohaveacodecthatwouldallowbothverygoodqualityspeechandlowbit-rate(unfortunatelynotatthesametime!),whichledustodevelop-ingacodecwithmultiplebit-rates.Ofcourseverygoodqualityalsomeantwehadtodowideband(16kHzsamplingrate)inadditiontonarrowband(telephonequality,8kHzsamplingrate).
DesigningforVoIPinsteadofcellphoneusemeansthatSpeexmustberobusttolostpackets,butnottocorruptedonessincepacketseitherarriveunalteredordon’tar-riveatall.Also,theideawastohaveareasonablecomplexityandmemoryrequirementwithoutcompromisingtoomuchontheefficiencyofthecodec.
AllthisledustothechoiceofCELPastheencodingtechniquetouseforSpeex.OneofthemainreasonsisthatCELPhaslongprovedthatitcoulddothejobandscalewelltobothlowbit-rates(thinkDoDCELP@4.8kbps)andhighbit-rates(thinkG.728@16kbps).
Themaincharacteristicscanbesummarizedasfollows:Freesoftware/open-source,patentandroyalty-free
Integrationofnarrowbandandwidebandinthesamebit-streamWiderangeofbit-ratesavailable(from2kbpsto44kbps)Dynamicbit-rateswitchingandVariableBit-Rate(VBR)VoiceActivityDetection(VAD,integratedwithVBR)Variablecomplexity
Ultra-widebandmodeat32kHz(upto48kHz)Intensitystereoencodingoption
Thisdocumentisdividedinthefollowingway.Section2describesthedifferentSpeexfeaturesanddefinessometermsthatwillbeusedinlatersections.Section3providesinformationaboutthestandardcommand-linetools,while4containsinformationaboutprogrammingusingtheSpeexAPI.Section5hassomeinformationrelatedtoSpeexandstandards.Thethreelastsectionsdescribetheinternalsofthecodecandrequiresomesignalprocessingknowledge.Section6explainsthegeneralideabehindCELP,whilesections7and8arespecifictoSpeex.NotethatifyouareonlyinterestedinusingSpeex,thosethreelastsectionsarenotrequired.
2FEATUREDESCRIPTION7
2Featuredescription
ThissectionexplainsthemainSpeexfeatures,aswellassomeconceptsinspeechcodingthathelpbetterunderstandthenextsections.
Samplingrate
Speexismainlydesignedfor3differentsamplingrates:8kHz,16kHz,and32kHz.Thesearerespectivelyreferedtoasnarrowband,widebandandultra-wideband.
Quality
Speexencodingiscontrolledmostofthetimebyaqualityparameterthatrangesfrom0to10.Inconstantbit-rate(CBR)operation,thequalityparameterisaninteger,whileforvariablebit-rate(VBR),theparameterisafloat.
Complexity(variable)
WithSpeex,itispossibletovarythecomplexityallowedfortheencoder.Thisisdonebycontrollinghowthesearchisperformedwithanintegerrangingfrom1to10inawaythat’ssimilartothe-1to-9optionstogzipandbzip2compressionutilities.Fornormaluse,thenoiselevelatcomplexity1isbetween1and2dBhigherthanatcomplexity10,buttheCPUrequirementsforcomplexity10isabout5timeshigherthanforcomplexity1.Inpractice,thebesttrade-offisbetweencomplexity2and4,thoughhighersettingsareoftenusefulwhenencodingnon-speechsoundslikeDTMFtones.
VariableBit-Rate(VBR)
Variablebit-rate(VBR)allowsacodectochangeitsbit-ratedynamicallytoadapttothe“difficulty”oftheaudiobeingencoded.IntheexampleofSpeex,soundslikevowelsandhigh-energytransientsrequireahigherbit-ratetoachievegoodquality,whilefricatives(e.g.s,fsounds)canbecodedadequatelywithlessbits.Forthisreason,VBRcanachivelowerbit-rateforthesamequality,orabetterqualityforacertainbit-rate.Despiteitsadvantages,VBRhastwomaindrawbacks:first,byonlyspecifyingquality,there’snoguarantyaboutthefinalaveragebit-rate.Second,forsomereal-timeapplicationslikevoiceoverIP(VoIP),whatcountsisthemaximumbit-rate,whichmustbelowenoughforthecommunicationchannel.
AverageBit-Rate(ABR)
Averagebit-ratesolvesoneoftheproblemsofVBR,asitdynamicallyadjustsVBRqualityinordertomeetaspecifictargetbit-rate.Becausethequality/bit-rateisadjustedinreal-time(open-loop),theglobalqualitywillbeslightlylowerthanthatobtainedbyencodinginVBRwithexactlytherightqualitysettingtomeetthetargetaveragebit-rate.
2FEATUREDESCRIPTION8
VoiceActivityDetection(VAD)
Whenenabled,voiceactivitydetectiondetectswhethertheaudiobeingencodedisspeechorsilence/backgroundnoise.VADisalwaysimplicitlyactivatedwhenencodinginVBR,sotheoptionisonlyusefulinnon-VBRoperation.Inthiscase,Speexdetectsnon-speechperiodsandencodethemwithjustenoughbitstoreproducethebackgroundnoise.Thisiscalled“comfortnoisegeneration”(CNG).
DiscontinuousTransmission(DTX)
DiscontinuoustransmissionisanadditiontoVAD/VBRoperation,thatallowstostoptransmittingcompletelywhenthebackgroundnoiseisstationary.Infile-basedopera-tion,sincewecannotjuststopwritingtothefile,only5bitsareusedforsuchframes(correspondingto250bps).
Perceptualenhancement
Perceptualenhancementisapartofthedecoderwhich,whenturnedon,triestoreduce(theperceptionof)thenoiseproducedbythecoding/decodingprocess.Inmostcases,perceptualenhancementmakethesoundfurtherfromtheoriginalobjectively(ifyouuseSNR),butintheenditstillsoundsbetter(subjectiveimprovement).
Algorithmicdelay
Everyspeechcodecintroducesadelayinthetransmission.ForSpeex,thisdelayisequaltotheframesize,plussomeamountof“look-ahead”requiredtoprocesseachframe.Innarrowbandoperation(8kHz),thedelayis30ms,whileforwideband(16kHz),thedelayis34ms.Thesevaluesdon’taccountfortheCPUtimeittakestoencodeordecodetheframes.
3COMMAND-LINEENCODER/DECODER9
3Command-lineencoder/decoder
ThebaseSpeexdistributionincludesacommand-lineencoder(speexenc)anddecoder(speexdec).Thissectiondescribeshowtousethesetools.
3.1speexenc
ThespeexencutilityisusedtocreateSpeexfilesfromrawPCMorwavefiles.Itcanbeusedbycalling:
speexenc[options]input_fileoutput_file
Thevalue’-’forinput_fileoroutput_filecorrespondsrespectivelytostdinandstdout.Thevalidoptionsare:
–narrowband(-n)TellSpeextotreattheinputasnarrowband(8kHz).Thisisthe
default–wideband(-w)TellSpeextotreattheinputaswideband(16kHz)
–ultra-wideband(-u)TellSpeextotreattheinputas“ultra-wideband”(32kHz)–qualitynSettheencodingquality(0-10),defaultis8–bitratenEncodingbit-rate(usebit-ratenorlower)–vbrEnableVBR(VariableBit-Rate),disabledbydefault
–abrnEnableABR(AverageBit-Rate)atnkbps,disabledbydefault–vadEnableVAD(VoiceActivityDetection),disabledbydefault–dtxEnableDTX(DiscontinuousTransmission),disabledbydefault
–nframesnPacknframesineachOggpacket(thissavesspaceatlowbit-rates)–compnSetencodingspeed/qualitytradeoff.Thehigherthevalueofn,theslower
theencoding(defaultis3)-VVerboseoperation,printbit-ratecurrentlyinuse–help(-h)Printthehelp
–version(-v)PrintversioninformationSpeexcomments
–commentAddthegivenstringasanextracomment.Thismaybeusedmultiple
times.–authorAuthorofthistrack.–titleTitleforthistrack.
3COMMAND-LINEENCODER/DECODER10
Rawinputoptions
–ratenSamplingrateforrawinput–stereoConsiderrawinputasstereo–leRawinputislittle-endian–beRawinputisbig-endian–8bitRawinputis8-bitunsigned–16bitRawinputis16-bitsigned
3.2speexdec
speexdec[options]speex_file[output_file]
ThespeexdecutilityisusedtodecodeSpeexfilesandcanbeusedbycalling:
Thevalue’-’forinput_fileoroutput_filecorrespondsrespectivelytostdinandstdout.Also,whennooutput_fileisspecified,thefileisplayedtothesoundcard.Thevalidoptionsare:
–enhenablepost-filter(default)–no-enhdisablepost-filter
–force-nbForcedecodinginnarrowband–force-wbForcedecodinginwideband–force-uwbForcedecodinginultra-wideband–monoForcedecodinginmono–stereoForcedecodinginstereo
–ratenForcedecodingatnHzsamplingrate–packet-lossnSimulaten%randompacketloss-VVerboseoperation,printbit-ratecurrentlyinuse–help(-h)Printthehelp
–version(-v)Printversioninformation
4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)11
4ProgrammingwithSpeex(thelibspeexAPI)
ThissectionexplainshowtousetheSpeexAPI.ExamplesofcodecanalsobefoundinappendixB.
4.1Encoding
#include InordertoencodespeechusingSpeex,youfirstneedto: YouthenneedtodeclareaSpeexbit-packingstruct SpeexBitsbits;andaSpeexencoderstate void*enc_state;Thetwoareinitializedby: speex_bits_init(&bits); enc_state=speex_encoder_init(&speex_nb_mode); Forwidebandcoding,speex_nb_modewillbereplacedbyspeex_wb_mode.Inmostcases,youwillneedtoknowtheframesizeusedbythemodeyouareusing.Youcangetthatvalueintheframe_sizevariablewith: speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);Oncetheinitializationisdone,foreveryinputframe: speex_bits_reset(&bits); speex_encode(enc_state,input_frame,&bits); nbBytes=speex_bits_write(&bits,byte_ptr,MAX_NB_BYTES);whereinput_frameisa(float*)pointingtothebeginningofaspeechframe,byte_ptrisa(char*)wheretheencodedframewillbewritten,MAX_NB_BYTESisthemaxi-mumnumberofbytesthatcanbewrittentobyte_ptrwithoutcausinganoverflowandnbBytesisthenumberofbytesactuallywrittentobyte_ptr(theencodedsizeinbytes).Beforecallingspeex_bits_write,itispossibletofindthenumberofbytesthatneedtobewrittenbycallingspeex_bits_nbytes(&bits),whichreturnsanumberofbytes.Afteryou’redonewiththeencoding,freeallresourceswith: speex_bits_destroy(&bits); speex_encoder_destroy(enc_state);That’saboutitfortheencoder. 4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)12 4.2Decoding #include InordertoencodespeechusingSpeex,youfirstneedto: YoualsoneedtodeclareaSpeexbit-packingstruct SpeexBitsbits;andaSpeexencoderstate void*dec_state;Thetwoareinitializedby: speex_bits_init(&bits); dec_state=speex_decoder_init(&speex_nb_mode); Forwidebanddecoding,speex_nb_modewillbereplacedbyspeex_wb_mode.Ifyouneedtoobtainthesizeoftheframesthatwillbeusedbythedecoder,youcangetthatvalueintheframe_sizevariablewith: speex_decoder_ctl(dec_state,SPEEX_GET_FRAME_SIZE,&frame_size);Thereisalsoaparameterthatcanbesetforthedecoder:whetherornottouseaperceptualpost-filter.Thiscanbesetby: speex_decoder_ctl(dec_state,SPEEX_SET_ENH,&enh); whereenhisanintthatwithvalue0tohavethepost-filterdisabledand1tohaveitenabled. Again,oncethedecoderinitializationisdone,foreveryinputframe: speex_bits_read_from(&bits,input_bytes,nbBytes);speex_decode(st,&bits,output_frame); whereinput_bytesisa(char*)containingthebit-streamdatareceivedforaframe,nbBytesisthesize(inbytes)ofthatbit-stream,andoutput_frameisa(float*)andpointstotheareawherethedecodedspeechframewillbewritten.ANULLvalueasthefirstargumentindicatesthatwedon’thavethebitsforthecurrentframe.Whenaframeislost,theSpeexdecoderwilldoitsbestto\"guess\"thecorrectsignal.Afteryou’redonewiththedecoding,freeallresourceswith: speex_bits_destroy(&bits); speex_decoder_destroy(dec_state); 4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)13 4.3CodecOptions(speex_*_ctl) TheSpeexencoderanddecodersupportmanyoptionsandrequeststhatcanbeaccessedthroughthespeex_encoder_ctlandspeex_decoder_ctlfunctions.Thesefunctionsaresimilartotheioctlsystemcallandtheirprototypesare: voidspeex_encoder_ctl(void*encoder,intrequest,void*ptr);voidspeex_decoder_ctl(void*encoder,intrequest,void*ptr);Thedifferentvaluesofrequestallowedare(notethatsomeonlyapplytotheencoderorthedecoder): SPEEX_SET_ENH**Setperceptualenhancertoon(1)oroff(0)(integer)SPEEX_GET_ENH**Getperceptualenhancerstatus(integer) SPEEX_GET_FRAME_SIZEGettheframesizeusedforthecurrentmode(integer)SPEEX_SET_QUALITY*Settheencoderspeechquality(integer0to10)SPEEX_GET_QUALITY*Getthecurrentencoderspeechquality(integer0to10)SPEEX_SET_MODE*†SPEEX_GET_MODE*†SPEEX_SET_LOW_MODE*†SPEEX_GET_LOW_MODE*†SPEEX_SET_HIGH_MODE*†SPEEX_GET_HIGH_MODE*† SPEEX_SET_VBR*Setvariablebit-rate(VBR)toon(1)oroff(0)(integer)SPEEX_GET_VBR*Getvariablebit-rate(VBR)status(integer) SPEEX_SET_VBR_QUALITY*SettheencoderVBRspeechquality(float0to10)SPEEX_GET_VBR_QUALITY*GetthecurrentencoderVBRspeechquality(float 0to10)SPEEX_SET_COMPLEXITY*SettheCPUresourcesallowedfortheencoder(in-teger1to10)SPEEX_GET_COMPLEXITY*GettheCPUresourcesallowedfortheencoder(in-teger1to10)SPEEX_SET_BITRATE*Setthebit-ratetousetotheclosestvaluenotexceeding theparameter(integerinbps) 4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)14 SPEEX_GET_BITRATEGetthecurrentbit-rateinuse(integerinbps)SPEEX_SET_SAMPLING_RATESetrealsamplingrate(integerinHz)SPEEX_GET_SAMPLING_RATEGetrealsamplingrate(integerinHz) SPEEX_RESET_STATEResettheencoder/decoderstatetoitsoriginalstate(zeros allmemories)SPEEX_SET_VAD*Setvoiceactivitydetection(VAD)toon(1)oroff(0)(integer)SPEEX_GET_VAD*Getvoiceactivitydetection(VAD)status(integer) SPEEX_SET_DTX*Setdiscontinuoustransmission(DTX)toon(1)oroff(0)(inte-ger)SPEEX_GET_DTX*Getdiscontinuoustransmission(DTX)status(integer)SPEEX_SET_ABR*Setaveragebit-rate(ABR)toavalueninbitspersecond(inte-gerinbps)SPEEX_GET_ABR*Getaveragebit-rate(ABR)setting(integerinbps)*appliesonlytotheencoder**appliesonlytothedecoder†normallyonlyusedinternally 4.4Modequeries Speexmodeshaveaquerysystemsimilartothespeex_encoder_ctlandspeex_decoder_ctlcalls.Sincemodesareread-only,itisonlypossibletogetinformationaboutaparticularmode.Thefunctionusedtodothatis: voidspeex_mode_query(SpeexMode*mode,intrequest,void*ptr);Theadmissiblevaluesforrequestare(unlessotherwisenote,thevaluesarereturnedthroughptr): SPEEX_MODE_FRAME_SIZEGettheframesize(insamples)forthemodeSPEEX_SUBMODE_BITRATEGetthebit-rateforasubmodenumberspecified throughptr(integerinbps). 4PROGRAMMINGWITHSPEEX(THELIBSPEEXAPI)15 4.5Packingandin-bandsignalling Sometimesitisdesirabletopackmorethanoneframeperpacket(orotherbasicunitofstorage).Theproperwaytodoitistocallspeex_encodeNtimesbeforewritingthestreamwithspeex_bits_write.Incaseswherethenumberofframesisnotdeterminedbyanout-of-bandmechanism,itispossibletoincludeaterminatorcode.Thattermi-natorconsistsofthecode15(decimal)encodedwith5bits,asshowninfigure4.Notethatasofversion1.0.2,callingspeex_bits_writeautomaticallyinsertstheterminatorsoastofillthelastbyte.Thisdoesn’tinvolvesanyoverheadandmakessureSpeexcanalwaysdetectwhenthereisnomoreframeinapacket. Itisalsopossibletosendin-band“messages”totheotherside.Allthesemessagesareencodedas“pseudo-frames”ofmode14whichcontaina4-bitmessagetypecode,followedbythemessage.Table1liststheavailablecodes,theirmeaningandthesizeofthemessagethatfollows.Mostofthesemessagesarerequeststhataresenttotheencoderordecoderontheotherend,whichisfreetocomplyorignorethem.Bydefault,allin-bandmessagesareignored.Code0123456789101112131415Size(bits)1144444488161632326464ContentAsksdecodertosetperceptualenhancementoff(0)oron(1)Asks(if1)theencodertobeless“agressive”duetohighpacketlossAsksencodertoswitchtomodeNAsksencodertoswitchtomodeNforlow-bandAsksencodertoswitchtomodeNforhigh-bandAsksencodertoswitchtoqualityNforVBRRequestacknowloedge(0=no,1=all,2=onlyforin-banddata)AsksencodertosetCBR(0),VAD(1),DTX(3),VBR(5),VBR+DTX(7)Transmit(8-bit)charactertotheotherendIntensitystereoinformationAnnouncemaximumbit-rateacceptable(Ninbytes/second)reservedAcknowledgereceivingpacketNreservedreservedreservedTable1:In-bandsignallingcodes Finally,applicationsmaydefinecustomin-bandmessagesusingmode13.Thesizeofthemessageinbytesisencodedwith5bits,sothatthedecodercanskipitifitdoesn’tknowhowtointerpretit. 5FORMATSANDSTANDARDS16 5Formatsandstandards Speexcanencodespeechinbothnarrowbandandwidebandandprovidesdifferentbit-rates.However,notallfeaturesneedtobesupportedbyacertainimplementationordevice.Inordertobecalled“Speexcompatible”(whateverthatmeans),animplemen-tationmustimplementatleastabasicsetoffeatures. Attheminimum,allnarrowbandmodesofoperationMUSTbesupportedatthedecoder.Thisincludesthedecodingofawidebandbit-streambythenarrowbandde-coder1.Ifpresent,awidebanddecoderMUSTbeabletodecodeanarrowbandstream,andMAYeitherbeabletodecodeallwidebandmodesorbeabletodecodetheem-beddednarrowbandpartofallmodes(whichincludesignoringthehigh-bandbits).Forencoders,atleastonenarrowbandorwidebandmodeMUSTbesupported.Themainreasonwhyallencodingmodesdonothavetobesupportedisthatsomeplatformsmaynotbeabletohandlethecomplexityofencodinginsomemodes. 5.1RTPPayloadFormat TheRTPpayloaddraftisincludedinappendixCandthelatestversionisavailableathttp://www.speex.org/drafts/latest.Thisdrafthasbeensent(2003/02/26)totheInternetEngineeringTaskForce(IETF)andwillbediscussedattheMarch18thmeetinginSanFrancisco. 5.2MIMEType Fornow,youshouldusetheMIMEtypeaudio/x-speexforSpeex.Wewillapplyfortypeaudio/speexinthenearfuture. 5.3Oggfileformat Speexbit-streamscanbestoredinOggfiles.Inthiscase,thefirstpacketoftheOggfilecontainstheSpeexheaderdescribedintable2.Allintegerfieldsintheheadersarestoredaslittle-endian.Thespeex_stringfieldmustcontainthe“Speex“(with3trainingspaces),whichidentifiesthebit-stream.Thenextfield,speex_versioncontainstheversionofSpeexthatencodedthefile.Fornow,refertospeex_header.[ch]formoreinfo.Thebeginningofstream(b_o_s)flagissetto1fortheheader.Theheaderpackethaspacketno=0andgranulepos=0. ThesecondpacketcontainstheSpeexcommentheader.TheformatusedistheVor-biscommentformatdescribedhere:http://www.xiph.org/ogg/vorbis/doc/v-comment.html.Thispackethaspacketno=1andgranulepos=0. Thethirdandsubsequentpacketseachcontainoneormore(numberfoundinheader)Speexframes.Theseareidentifiedwithpacketnostartingfrom2andthegranuleposisthenumberofthelastsampleencodedinthatpacket.Thelastofthesepacketshastheendofstream(e_o_s)flagissetto1. 1The widebandbit-streamcontainsanembeddednarrowbandbit-streamwhichcanbedecodedalone 6INTRODUCTIONTOCELPCODINGFieldspeex_stringspeex_versionspeex_version_idheader_sizeratemodemode_bitstream_versionnb_channelsbitrateframe_sizevbrframes_per_packetextra_headersreserved1reserved2Typechar[]char[]intintintintintintintintintintintintintSize820444444444444417 Table2:Ogg/Speexheaderpacket 6IntroductiontoCELPCoding SpeexisbasedonCELP,whichstandsforCodeExcitedLinearPrediction.ThissectionattemptstointroducetheprinciplesbehindCELP,soifyouarealreadyfamiliarwithCELP,youcansafelyskiptosection7.TheCELPtechniqueisbasedonthreeideas:1.Theuseofalinearprediction(LP)modeltomodelthevocaltract 2.Theuseof(adaptiveandfixed)codebookentriesasinput(excitation)oftheLPmodel3.Thesearchperformedinclosed-loopina“perceptuallyweighteddomain”ThissectiondescribesthebasicideasbehindCELP.Notethatit’sstillincomplete. 6.1LinearPrediction(LPC) Linearpredictionisatthebaseofmanyspeechcodingtechniques,includingCELP.Theideabehinditistopredictthesignalx[n]usingalinearcombinationofitspastsamples: y[n]=∑aix[n i] i=1N wherey[n]isthelinearpredictionofx[n].Thepredictionerroristhusgivenby: e[n]=x[n] y[n]=x[n] ∑aix[n i] i=1N 6INTRODUCTIONTOCELPCODING18 ThegoaloftheLPCanalysisistofindthebestpredictioncoefficientsaiwhichminimizethequadraticerrorfunction: L 1 E= n=0 ∑[e[n]]2=∑ 4 L 1n=0 4 x[n] ∑aix[n i] i=1 ∂E∂aiN N 52 Thatcanbedonebymakingallderivatives ∂∂E=∂ai∂ai L 1n=0 equaltozero: 52 =0 ∑x[n] ∑aix[n i] i=1 TheaifiltercoefficientsarecomputedusingtheLevinson-Durbinalgorithm,which startsfromtheauto-correlationR(m)ofthesignalx[n]. N 1 R(m)= ForanorderNfilter,wehave: P R(0)TR(1)TR=T..R. i=0 ∑x[i]x[i m] R(1) R(0)... ¡¡¡ ¡¡¡...¡¡¡QUUUS R(N 1)R(N 2) ...R(0) QU UUS R(N 1)R(N 2) PTTr=T R R(1)R(2)...R(N) ThefiltercoefficientsaiarefoundbysolvingthesystemRa=r.Whatthe 2¡Levinson-Durbin 3¡algorithmdoeshereismakingthesolutiontotheproblemONinsteadof ONbyexploitingthefactthatmatrixRistoeplitzhermitian.Also,itcanbeproventhatalltherootsofA(z)arewithintheunitcircle,whichmeansthat1=A(z)isalwaysstable.Thisisintheory;inpracticebecauseoffiniteprecision,therearetwocom-monlyusedtechniquestomakesurewehaveastablefilter.First,wemultiplyR(0)byanumberslightlyaboveone(suchas1.0001),whichisequivalenttoaddingnoisetothesignal.Also,wecanapplyawindowtotheauto-correlation,whichisequivalenttofilteringinthefrequencydomain,reducingsharpresonances. Thelinearpredictionmodelrepresentseachspeechsampleasalinearcombinationofpastsamples,plusanerrorsignalcalledtheexcitation(orresidual). x[n]=∑aix[n i]+e[n] i=1N Inthez-domain,thiscanbeexpressedas 6INTRODUCTIONTOCELPCODING19 x(z)= whereA(z)isdefinedas 1 e(z)A(z) A(z)=1 ∑aiz i i=1 N WeusuallyrefertoA(z)astheanalysisfilterand1=A(z)asthesynthesisfilter.Thewholeprocessiscalledshort-termpredictionasitpredictsthesignalx[n]usingapredictionusingonlytheNpastsamples,whereNisusuallyaround10. BecauseLPCcoefficientshaveverylittlerobustnesstoquantization,theyarecon-vertedtoLineSpectralPair(LSP)coefficientswhichhaveamuchbetterbehaviourwithquantization,oneofthembeingthatit’seasytokeepthefilterstable. 6.2PitchPrediction Duringvoicedsegments,thespeechsignalisperiodic,soitispossibletotakeadvantageofthatpropertybyapproximatingtheexcitationsignale[n]byagaintimesthepastoftheexcitation: e[n]9p[n]=βe[n T] whereTisthepitchperiod,βisthepitchgain.Wecallthatlong-termpredictionsincetheexcitationispredictedfrome[n T]withT)N. 6.3InnovationCodebook Thefinalexcitatione[n]willbethesumofthepitchpredictionandaninnovationsignalc[n]takenfromafixedcodebook,hencethenameCodeExcitedLinearPrediction.Thefinalexcitationisgivenby: e[n]=p[n]+c[n]=βe[n T]+c[n] Thequantizationofc[n]iswheremostofthebitsinaCELPcodecareallocated.Itrepresentstheinformationthatcouldn’tbeobtainedeitherfromlinearpredictionorpitchprediction.Inthez-domainwecanrepresentthefinalsignalX(z)as X(z)= C(z) A(z)(1 βz T)6.4Analysis-by-SynthesisandErrorWeighting Most(ifnotall)modernaudiocodecsattemptto“shape”thenoisesothatitappearsmostlyinthefrequencyregionswheretheearcannotdetectit.Forexample,theearis 6INTRODUCTIONTOCELPCODING20 moretoleranttonoiseinpartsofthespectrumthatarelouderandviceversa.That’swhyinsteadofminimizingthesimplequadraticerror E=∑(x[n] x[n])2 n wherex[n]istheencodersignal,weminimizetheerrorfortheperceptuallyweightedsignal Xw(z)=W(z)X(z)whereW(z)istheweightingfilter,usuallyoftheform Aγz1 W(z)=Aγz2 (1) withcontrolparametersγ1>γ2.Ifthenoiseiswhiteintheperceptuallyweighteddomain,theninthesignaldomainitsspectralshapewillbeoftheform Aγz2 1 Anoise(z)==W(z)Az γ1 IfafilterA(z)has(complex)polesatpiinthez-plane,thefilterA(z=γ)willhaveitspolesatpHi=γpi,makingitaflatterversionofA(z). Analysis-by-synthesisreferstothefactthatwhentryingtofindthebestpitchpa-rameters(T,β)andinnovationsignalc[n],wedonotworkbymakingtheexcitatione[n]ascloseastheoriginalone(whichwouldbesimpler),butapplythesynthesis(andweighting)filterandtrymakingXw(z)asclosetotheoriginalaspossible. 7SPEEXNARROWBANDMODE21 7Speexnarrowbandmode ThissectionlooksathowSpeexworksfornarrowband(8kHzsamplingrate)operation.Theframesizeforthismodeis20ms,correspondingto160samples.Eachframeisalsosubdividedinto4sub-framesof40sampleseach. Alsomanydesigndecisionswerebasedontheoriginalgoalsandassumptions:Minimizingtheamountofinformationextractedfrompastframes(forrobust-nesstopacketloss) Dynamically-selectablecodebooks(LSP,pitchandinnovation)sub-vectorfixed(innovation)codebooks 7.1LPCAnalysis AnLPCanalysisisfirstperformedona(asymetricHamming)windowthatspansallofthecurrentframeandhalfaframeinadvance.TheLPCcoefficientsarethenconvertedtoLineSpectralPair(LSP),arepresentationthatismorerobusttoquantization.TheLSP’sareconsideredtobeassociatedtothe4thsub-framesandtheLSP’sassociatedtothefirst3sub-framesarelinearlyinterpolatedusingthecurrentandpreviousLSP’s.TheLSP’sareencodedusing30bitsforhigherqualitymodesand18bitsforlowerquality,throughtheuseofamulti-stagesplit-vectorquantizer.Forthelowerqualitymodes,the10coefficientsarefirstquantizedwith6bitsandtheerroristhendividedintwo5-coefficientsub-vectors.Eachofthemisquantizedwith6bits,foratotalof18bits.Forthehigherqualitymodes,theremainingerroronbothsub-vectorsisfurtherquantizedwith6bitseach,foratotalof30bits. TheperceptualweightingfilterW(z)usedbySpeexisderivedfromtheLPCfilterA(z)andcorrespondstotheonedescribedbyeq.1withγ1=0:9andγ2=0:6.WecanusetheunquantizedA(z)filtersincetheweightingfilterisonlyusedintheencoder. 7.2PitchPrediction(adaptivecodebook) Speexusesa3-tappredictionforpitch.Thatis,thepitchpredictionsignalp[n]isobtainedbythepastoftheexcitationby: p[n]=β0e[n T 1]+β1e[n T]+β2e[n T+1] whereTisthepitchperiodandtheβiaretheprediction(filter)taps.Itisworthnotingthatwhenthepitchissmallerthanthesub-framesize,werepeattheexcitationataperiodT.Forexample,whenn T+1,weusen 2T+1instead.Theperiodandquantizedgainsaredeterminedinclosedloop(analysis-by-synthesis).Inmostmodes,thepitchperiodisencodedwith7bitsinthe[17;144]rangeandtheβicoefficientsarevector-quantizedusing7bits(15kbpsnarrowbandandabove)athigherbit-ratesand5bitsatlowerbit-rates(11kbpsnarrowbandandbelow). 7SPEEXNARROWBANDMODE22 7.3InnovationCodebook InSpeex,theinnovationsignalisquantizedusingsub-vectorshape-onlyvectorquan-tization(VQ).Thatmeansthattheinnovationsignalisdividedinsub-vectors(ofsize5to20)andquantizedusingacodebookthatrepresentsboththeshapeandthegainatthesametime.Thissavesmanybitsthatwouldotherwisebeallocatedforaseparategainatthepriceofaslightincreaseincomplexity. 7.4Bitallocation Thereare7differentnarrowbandbit-ratesdefinedforSpeex,rangingfrom250bpsto24.6kbps,althoughthemodesbelow5.9kbpsshouldnotbeusedforspeech.Thebit-allocationforeachmodeisdetailedintable3.EachframestartswiththemodeIDencodedwith4bitswhichallowsarangefrom0to15,thoughonlythefirst7valuesareused(theothersarereserved).Theparametersarelistedinthetableintheordertheyarepackedinthebit-stream.Allframe-basedparametersarepackedbeforesub-frameparameters.Theparametersforacertainsub-frameareallpackedbeforethefollowingsub-frameispacked.Notethatthe“OL”intheparameterdescriptionmeansthattheparameterisanopenloopestimationbasedonthewholeframe.ParameterWidebandbitModeIDLSPOLpitchOLpitchgainOLExcgainFinepitchPitchgainInnovationgainInnovationVQTotalUpdaterateframeframeframeframeframeframesub-framesub-framesub-framesub-frameframe01400000000511418745001043214187050501611931418005751201604141800575135220514300057734830061430005773643647143000577396492814187450001079Table3:Bitallocationfornarrowbandmodes Sofar,noMOS(MeanOpinionScore)subjectiveevaluationhasbeenperformedforSpeex.Inordertogiveanideaofthequalityachivablewithit,table4presentsmyownsubjectiveopiniononit.Itsouldbenotedthatdifferentpeoplewillperceivethequalitydifferentlyandthatthepersonthatdesignedthecodecoftenhasabias(onewayoranother)whenitcomestosubjectiveevaluation.Lastthing,itshouldbenotedthatformostcodecs(includingSpeex)encodingqualitysometimesvariesdependingontheinput.Notethatthecomplexityisonlyapproximate(within0.5mflopsandusingthelowestcomplexitysetting).Decodingrequiresapproximately0.5mflopsinmostmodes(1mflopswithperceptualenhancement). 7SPEEXNARROWBANDMODEMode0123456789101112131415Bit-rate(bps)2502,1505,9508,00011,00015,00018,20024,6003,950N/AN/AN/AN/AN/AN/AN/AmflopsN/A6910141117.514.510.5N/AN/AN/AN/AN/AN/AN/A23 Quality/descriptionNotransmission(DTX)Vocoder(mostlyforcomfortnoise)Verynoticeableartifacts/noise,goodintelligibilityArtifacts/noisesometimesnoticeableArtifactsusuallynoticeableonlywithheadphonesNeedgoodheadphonestotellthedifferenceHardtotellthedifferenceevenwithgoodheadphonesCompletelytransparentforvoice,goodqualitymusicVerynoticeableartifacts/noise,goodintelligibilityreservedreservedreservedreservedApplication-defined,interpretedbycallbackorskippedSpeexin-bandsignalingTerminatorcodeTable4:Qualityversusbit-rate 7.5Perceptualenhancement Thispartofthecodeconlyappliestothedecoderandcanevenbechangedwithoutaffectinginter-operability.Forthatreason,theimplementationprovidedanddescribedhereshouldonlybeconsideredasareferenceimplementation.Theenhancementsys-temisdividedintotwoparts.First,thesynthesisfilterS(z)=1=A(z)isreplacedbyanenhancedfilter A(z=a2)A(z=a3) SH(z)= A(z)A(z=a1) 1 ra11 wherea1anda2dependonthemodeinuseanda3=r1 1 ra2withr=:9.Thesecondpartoftheenhancementconsistsofusingacombfiltertoenhancethepitchintheexcitationdomain. 8SPEEXWIDEBANDMODE(SUB-BANDCELP)24 8Speexwidebandmode(sub-bandCELP) Forwideband,theSpeexapproachusesaquadraturemirrorfilter(QMF)tosplitthebandintwo.The16kHzsignalisthusdividedintotwo8kHzsignals,onerepre-sentingthelowband(0-4kHz),theotherthehighband(4-8kHz).Thelowbandisencodedwiththenarrowbandmodedescribedinsection7insuchawaythatthere-sulting“embeddednarrowbandbit-stream”canalsobedecodedwiththenarrowbanddecoder.Sincethelowbandencodinghasalreadybeendescribed,onlythehighbandencodingisdescribedinthissection. 8.1LinearPrediction Thelinearpredictionpartusedforthehigh-bandisverysimilartowhatisdonefornarrowband.Theonlydifferenceisthatweuseonly12bitstoencodethehigh-bandLSP’susingamulti-stagevectorquantizer(MSVQ).Thefirstlevelquantizesthe10coefficientswith6bitsandtheerroristhenquantizedusing6bits,too. 8.2PitchPrediction Thatpartiseasy:there’snopitchpredictionforthehigh-band.Therearetworeasonsforthat.First,thereisusuallylittleharmonicstructureinthisband(above4kHz).Second,itwouldbeveryhardtoimplementsincetheQMFfoldsthe4-8kHzbandinto4-0kHz(reversingthefrequencyaxis),whichmeansthatthelocationoftheharmonicsisnolongeratmultiplesofthefundamental(pitch). 8.3ExcitationQuantization Thehigh-bandexcitationiscodedinthesamewayasfornarrowband. 8.4Bitallocation Forthewidebandmode,theentirenarrowbandframeispackedbeforethehigh-bandisencoded.Thenarrowbandpartofthebit-streamisasdefinedintable3.Thehigh-bandfollows,asdescribedintable5.Thisalsomeansthatawidebandframemaybecorrectlydecodedbyanarrowbanddecoderwiththeonlycaveatthatifmorethanoneframeispackedinthesamepacket,thedecoderwillneedtoskipthehigh-bandpartsinordertosyncwiththebit-stream. 8SPEEXWIDEBANDMODE(SUB-BANDCELP)25 ParameterWidebandbitModeIDLSPExcitationgainExcitationVQTotalUpdaterateframeframeframesub-framesub-frameframe0130004113125036213124201123131244019241312480352Table5:Bitallocationforhigh-bandinwidebandmode AFAQ26 AFAQ Vorbisisopen-sourceandpatent-free;whydoweneedSpeex? VorbisisagreatprojectbutitsgoalsarenotthesameasSpeex.Vorbisismostlyaimedatcompressingmusicandaudioingeneral,whileSpeextargetsspeechonly.ForthatreasonSpeexcanachievemuchbetterresultsthanVorbisonspeech,typically2-4timeshighercompressionatequalquality. Isn’tthereaGPLimplementationoftheGSM-FRcodec?WhyisSpeexnecessary? Firstofall,it’snotclearwhetherGSM-FRiscoveredbyaPhilipspatent(seehttp://kbs.cs.tu-berlin.de/~jutta/toast.html).Also,GSM-FRoffersmediocrequalityatarelativelyhighbit-rate,whileSpeexcanofferequivalentqualityatalmosthalfthebit-rate.Lastbutnotleast,Speexoffersawiderangeofbit-ratesandsamplingrates,whileGSM-FRislimitedto8kHzspeechat13kbps. UnderwhatlicenseisSpeexreleased? Asofversion1.0beta1,SpeexisreleasedunderXiph’sversionofthe(revised)BSDlicense(seeAppendixD).Thislicenseisthemostpermissiveoftheopen-sourceli-censes. AmIallowedtouseSpeexincommercialsoftware? Yes.Aslongasyoucomplywiththelicense.Thisbasicallymeansyouhavetokeepthecopyrightnoticeandyoucan’tuseournametopromoteyourproductwithoutauthorization.Formoredetails,seelicenseinAppendixD. Ogg,Speex,Vorbis,what’sthedifference? Oggisacontainerformatforholdingmultimediadata.VorbisisanaudiocodecthatusesOggtostoreitsbit-streamsasfiles,hencethenameOggVorbis.SpeexalsousestheOggformattostoreitsbit-streamsasfiles,sotechnicallytheywouldbe“OggSpeex”files(IprefertocallthemjustSpeexfiles).OnedifferencewithVorbishowever,isthatSpeexislesstiedwithOgg.Actually,ifwhatyoudoisVoiceofIP(VoIP),youdon’tneedOggatall. What’stheextensionforSpeex? Speexfileshavethe.spxextension.Note,howeverthattheSpeextools(speexenc,speexdec)donotrelyontheextensionatall,soanyextensionwillwork. AFAQ27 CanIuseSpeexforcompressingmusic? JustlikeVorbisisnotreallyadaptedtospeech,Speexisreallynotadaptedformusic.Inmostcases,you’llbebetterofwithVorbiswhenitcomestomusic. IconvertedsomeMP3’stoSpeexandthequalityisbad.What’swrong? ThisiscalledtranscodinganditwillalwaysresultinmuchpoorerqualitythantheoriginalMP3.Unlessyouhaveareallygood(size)reasontodoso,nevertranscodespeech.Thisisevenvalidforselftranscoding(tandeming),i.e.IfyoudecodeaSpeexfileandre-encodeitagainatthesamebit-rate,youwilllosequality. DoesSpeexrunonWindows? CompilationonWindowshasbeensupportedsinceversion0.8.0.Therearealsosev-eralfront-endsavailablefromthewebsite. Whyisencodingsoslowcomparedtodecoding? Formostkindsofcompression,encodingisinherentlyslowerthandecoding.IncaseofSpeex,encodingconsistsoffinding,foreachvectorof5to10samples,entrythatmatchesthebestwithinacodebookconsistingof16to256entries.theotherhand,atdecodingallthatneedstobedoneislookuptherightentryincodebookusingtheencodedindex.Sincealookupismuchfasterthanasearch,decoderworksmuchfasterthantheencoder. thetheOnthethe WhyisSpeexsoslowonmyiPaq(orinsertanyplatformwithoutanFPU)? Well,theparenthesisprovidestheanswer:noFPU(floating-pointunit).TheSpeexcodemakesheavyuseoffloating-pointoperations.OndeviceswithnoFPU,allfloating-pointinstructionsneedtobeemulated.Thisisaverytimeconsumingop-eration. I’mgettingunusualbackgroundnoise(hiss)whenusinglibspeexinmyapplication.HowdoIfixthat? Oneofthecausescouldbescalingoftheinputspeech.Speexexpectssignalstohavea¦215(signedshort)dynamicrange.Ifthedynamicrangeofyoursignalsistoosmall(e.g.¦1:0),youwillsufferimportantquantizationnoise.Agoodtargetistohaveadynamicrangearound¦8000whichislargeenough,butsmallenoughtomakesurethere’snoclippingwhenconvertingbacktosignedshort. AFAQ28 Igetverydistortedspeechwhenusinglibspeexinmyapplication.What’swrong? Therearemanypossiblecausesforthat.Oneofthemiserrorsinthewaythebitsaremanipulated.Anotherpossiblecauseistheuseofthesameencoderordecoderstateformorethanoneaudiostream(channel),whichproducesstrangeeffectswiththefiltermemories.Iftheinputspeechhasanamplitudecloseto¦215,itispossiblethatatdecoding,theamplitudebeabithigherthanthat,causingclippingwhensavingas16-bitPCM. HowdoesSpeexcomparetootherproprietarycodecs? It’shardtogiveprecisefiguressincenoformallisteningtestshavebeenperformedyet.AllIcansayisthatintermsofquality,Speexcompetesonthesamegroundasotherproprietarycodecs(notnecessarilythebest,butnottheworsteither).Speexalsohasmanyfeaturesthatarenotpresentinmostothercodecs.Theseincludevariablebit-rate(VBR),integrationofnarrowbandandwideband,aswellasstereosupport.Ofcourse,anotherareawhereSpeexisreallyhardtobeatisthequality/priceratio.Unlikemanyveryexpensivecodecs,Speexisfreeandanyonemaydistribute/modifyitatwill. CanSpeexpassDTMF? Iguessitalldependsonthebit-rateused.Thoughnoformaltestinghasyetbeenper-formed,I’dsaydon’tgobelowthe15kbpsmodeifyouwantDTMFtobetransmittedcorrectly.DTMFat8kbpsmayworkbutyourmileagemayvary.Also,makesureyoudon’tusethelowestcomplexity(seeSPEEX_SET_COMPLEXITYor–compoption),asitcausessignificantnoise. CanSpeexpassV.9xmodemsignalscorrectly? IfIcoulddothatI’dbeveryrichbynow:-) Whatisyour(Jean-Marc)relationshipwiththeUniversityofSher-brookeandhowdoesSpeexfitintothat? Currently(2003/03/09),I’mdoingaPh.D.attheUniversityofSherbrookeinmo-bilerobotics.AlthoughIdidmymasterwiththeSherbrookespeechcodinggroup(inspeechenhancement,notcoding),Iamnotassociatedwiththemanymore.ItshouldnotbeunderstoodthattheyortheUniversityofSherbrookeendorsetheSpeexprojectinanyway.Furthermore,Speexdoesnotmakeuseofanycodeorproprietarytechnol-ogydevelopedintheSherbrookespeechcodinggroup. CELP,ACELP,what’sthedifference? CELPstandsfor“CodeExcitedLinearPrediction”,whileACELPstandsfor“Alge-braicCodeExcitedLinearPrediction”.ThatmeansACELPisaCELPtechniquethat AFAQ29 usesanalgebraiccodebookrepresentedasasumofunitpulses,thusmakingthecode-booksearchmuchmoreefficient.ThistechniquewasinventedattheUniversityofSherbrookeandisnowoneofthemostwidelyusedformofCELP.Unfortunately,sinceitispatented,itcannotbeusedinSpeex. BSAMPLECODE30 BSamplecode ThissectionshowssamplecodeforencodinganddecodingspeechusingtheSpeexAPI.Thecommandscanbeusedtoencodeanddecodeafilebycalling:%sampleencin_file.sw|sampledecout_file.sw wherebothfilesareraw(noheader)filesencodedat16bitspersample(inthemachinenaturalendianness). B.1sampleenc.c sampleenctakesaraw16bits/samplefile,encodesitandoutputsaSpeexstreamtostdout.NotethatthepackingusedisNOTcompatiblewiththatofspeexenc/speexdec.#include /*Theframesizeinhardcodedforthissamplecodebutitdoesn’thavetobe*/#defineFRAME_SIZE160 intmain(intargc,char**argv){ char*inFile;FILE*fin; shortin[FRAME_SIZE];floatinput[FRAME_SIZE];charcbits[200];intnbBytes; /*Holdsthestateoftheencoder*/void*state; /*HoldsbitssotheycanbereadandwrittentobytheSpeexroutines*/SpeexBitsbits;inti,tmp; /*Createanewencoderstateinnarrowbandmode*/state=speex_encoder_init(&speex_nb_mode);/*Setthequalityto8(15kbps)*/tmp=8; speex_encoder_ctl(state,SPEEX_SET_QUALITY,&tmp);inFile=argv[1]; fin=fopen(inFile,\"r\"); /*Initializationofthestructurethatholdsthebits*/speex_bits_init(&bits);while(1){ BSAMPLECODE31 /*Reada16bits/sampleaudioframe*/ fread(in,sizeof(short),FRAME_SIZE,fin);if(feof(fin)) break; /*Copythe16bitsvaluestofloatsoSpeexcanworkonthem*/for(i=0;i /*Encodetheframe*/ speex_encode(state,input,&bits); /*Copythebitstoanarrayofcharthatcanbewritten*/nbBytes=speex_bits_write(&bits,cbits,200); /*Writethesizeoftheframefirst.Thisiswhatsampledecexpectsbutit’slikelytobedifferentinyourownapplication*/fwrite(&nbBytes,sizeof(int),1,stdout);/*Writethecompresseddata*/ fwrite(cbits,1,nbBytes,stdout);} /*Destroytheencoderstate*/speex_encoder_destroy(state); /*Destroythebit-packingstruct*/speex_bits_destroy(&bits);fclose(fin);return0;} B.2sampledec.c sampledecreadsaSpeexstreamfromstdin,decodesitandoutputsittoaraw16bits/samplefile.NotethatthepackingusedisNOTcompatiblewiththatofspeex-enc/speexdec. #include /*Theframesizeinhardcodedforthissamplecodebutitdoesn’thavetobe*/#defineFRAME_SIZE160 intmain(intargc,char**argv){ char*outFile; BSAMPLECODE32 FILE*fout; /*Holdstheaudiothatwillbewrittentofile(16bitspersample)*/shortout[FRAME_SIZE]; /*Speexhandlesamplesasfloat,soweneedanarrayoffloats*/floatoutput[FRAME_SIZE];charcbits[200];intnbBytes; /*Holdsthestateofthedecoder*/void*state; /*HoldsbitssotheycanbereadandwrittentobytheSpeexroutines*/SpeexBitsbits;inti,tmp; /*Createanewdecoderstateinnarrowbandmode*/state=speex_decoder_init(&speex_nb_mode);/*Settheperceptualenhancementon*/tmp=1; speex_decoder_ctl(state,SPEEX_SET_ENH,&tmp);outFile=argv[1]; fout=fopen(outFile,\"w\"); /*Initializationofthestructurethatholdsthebits*/speex_bits_init(&bits);while(1){ /*Readthesizeencodedbysampleenc,thispartwilllikelybedifferentinyourapplication*/ fread(&nbBytes,sizeof(int),1,stdin); fprintf(stderr,\"nbBytes:%d\\n\nbBytes);if(feof(stdin)) break; /*Readthe\"packet\"encodedbysampleenc*/fread(cbits,1,nbBytes,stdin); /*Copythedataintothebit-streamstruct*/speex_bits_read_from(&bits,cbits,nbBytes);/*Decodethedata*/ speex_decode(state,&bits,output); /*Copyfromfloattoshort(16bits)foroutput*/for(i=0;i BSAMPLECODE /*Writethedecodedaudiotofile*/ fwrite(out,sizeof(short),FRAME_SIZE,fout);} /*Destroythedecoderstate*/speex_encoder_destroy(state);/*Destroythebit-streamtruct*/speex_bits_destroy(&bits);fclose(fout);return0; 33 } CIETFRTPPROFILE34 CIETFRTPProfile InternetEngineeringTaskForceInternetDraft draft-herlein-avt-rtp-speex-00.txtMarch3,2004 Expires:September3,2004GregHerleinJean-MarcValin SimonMorlatRogerHardiman PhilKerr RTPPayloadFormatfortheSpeexCodec StatusofthisMemo ThisdocumentisanInternet-DraftandisinfullconformancewithallprovisionsofSection10ofRFC2026. Internet-DraftsareworkingdocumentsoftheInternetEngineeringTaskForce(IETF),itsareas,anditsworkinggroups.NotethatothergroupsmayalsodistributeworkingdocumentsasInternet-Drafts. Internet-Draftsaredraftdocumentsvalidforamaximumofsixmonthsandmaybeupdated,replaced,orobsoletedbyother documentsatanytime.ItisinappropriatetouseInternet-Draftsasreferencematerialortocitethemotherthanas\"workinprogress\". ThelistofcurrentInternet-Draftscanbeaccessedathttp://www.ietf.org/ietf/1id-abstracts.txt ToviewthelistInternet-DraftShadowDirectories,seehttp://www.ietf.org/shadow.html. CopyrightNotice Copyright(C)TheInternetSociety(2003). AllRightsReserved. Abstract Speexisanopen-sourcevoicecodecsuitableforuseinVoiceover CIETFRTPPROFILE35 IP(VoIP)typeapplications.ThisdocumentdescribesthepayloadformatforSpeexgeneratedbitstreamswithinanRTPpacket.AlsoincludedherearethenecessarydetailsfortheuseofSpeexwiththeSessionDescriptionProtocol(SDP)andapreliminarymethodofusingSpeexwithinH.323applications. 1.Conventionsusedinthisdocument Thekeywords\"MUST\\"MUSTNOT\\"REQUIRED\\"SHALL\\"SHALLNOT\\"SHOULD\\"SHOULDNOT\\"RECOMMENDED\\"MAY\and\"OPTIONAL\"inthisdocumentaretobeinterpretedasdescribedinRFC2119[5].Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page1] March3,2004 2.OverviewoftheSpeexCodec SpeexisbasedontheCELP[12]encodingtechniquewithsupportfor eithernarrowband(nominal8kHz),wideband(nominal16kHz)or ultra-wideband(nominal32kHz),and(non-optimal)ratesupto48kHzsamplingalsoavailable.Themaincharacteristicscanbesummarizedasfollows:oooooo Freesoftware/open-source Integrationofwidebandandnarrowbandinthesamebit-streamWiderangeofbit-ratesavailable Dynamicbit-rateswitchingandvariablebit-rate(VBR)VoiceActivityDetection(VAD,integratedwithVBR)Variablecomplexity 3.RTPpayloadformatforSpeex ForRTPbasedtransportationofSpeexencodedaudiothestandardRTPheader[2]isfollowedbyoneormorepayloaddatablocks.Anoptionalpaddingterminatormayalsobeused. 012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|RTPHeader|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|oneormoreframesofSpeex....| CIETFRTPPROFILE36 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|oneormoreframesofSpeex....|padding|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3.1RTPHeader 012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TheRTPheaderbeginswithanoctetoffields(V,P,X,andCC)tosupportspecializedRTPuses(see[8]and[9]fordetails).ForSpeexthefollowingvaluesareused. Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page2] March3,2004 Version(V):2bits ThisfieldidentifiestheversionofRTP.Theversionusedbythisspecificationistwo(2).Padding(P):1bit Ifthepaddingbitisset,thepacketcontainsoneormoreadditionalpaddingoctetsattheendwhicharenotpartofthepayload.PissetifthetotalpacketsizeislessthantheMTU.Extension(X):1bit Iftheextension,X,bitisset,thefixedheaderMUSTbe followedbyexactlyoneheaderextension,withaformatdefinedinSection5.3.1.of[8],CSRCcount(CC):4bits CIETFRTPPROFILE TheCSRCcountcontainsthenumberofCSRCidentifiers. 37 Marker(M):1bit TheMbitindicatesifthepacketcontainscomfortnoise.ThisfieldisusedinconjunctionwiththecngSDPattributeandisdetailedfurtherinsection5below.Innormalusagethisbitissetifthepacketcontainscomfortnoise.PayloadType(PT):7bits AnRTPprofileforaclassofapplicationsisexpectedtoassignapayloadtypeforthisformat,oradynamicallyallocatedpayloadtypeSHOULDbechosenwhichdesignatesthepayloadasSpeex.Sequencenumber:16bits ThesequencenumberincrementsbyoneforeachRTPdatapacketsent,andmaybeusedbythereceivertodetectpacketlossandtorestorepacketsequence.Thisfieldisdetailedfurtherin[2].Timestamp:32bits AtimestamprepresentingthesamplingtimeofthefirstsampleofthefirstSpeexpacketintheRTPpacket.TheclockfrequencyMUSTbesettothesamplerateoftheencodedaudiodata. Speexuses20msecframesandavariablesamplingrateclock.TheRTPtimestampMUSTbeinunitsof1/XofasecondwhereX isthesamplerateused.Speexusesanominal8kHzsamplingratefornarrowbanduse,anominal16kHzsamplingrateforwidebanduse,andanominal32kHzsamplingrateforultra-widebanduse.SSRC/CSRCidentifiers: Thesetwofields,32bitseachwithoneSSRCfieldandamaximumof16CSRCfields,areasdefinedin[2]. Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page3] March3,2004 3.2Speexpayload ForthepurposesofpacketizingthebitstreaminRTP,itisonly necessarytoconsiderthesequenceofbitsasoutputbytheSpeex CIETFRTPPROFILE encoder[11],andpresentthesamesequencetothedecoder.payloadformatdescribedheremaintainsthissequence. 38The AtypicalSpeexframe,encodedatthemaximumbitrate,isapprox.110octetsandthetotalnumberofSpeexframesSHOULDbekept lessthanthepathMTUtopreventfragmentation.SpeexframesMUSTNOTbefragmentedacrossmultipleRTPpackets, AnRTPpacketMAYcontainSpeexframesofthesamebitrateorofvaryingbitrates,sincethebit-rateforaframeisconveyedinbandwiththesignal. Theencodinganddecodingalgorithmcanchangethebitrateatany 20msecframeboundary,withthebitratechangenotificationprovidedin-bandwiththebitstream.Eachframecontainsboth\"mode\" (narrowband,widebandorultra-wideband)and\"sub-mode\"(bit-rate)informationinthebitstream.Noout-of-bandnotificationisrequiredforthedecodertoprocesschangesinthebitratesentbytheencoder. ItisRECOMMENDEDthatvaluesof8000,16000and32000beusedfornormalinternettelephonyapplications,thoughthesamplerateissupportedatratesaslowas6000Hzandashighas48kHz. TheRTPpayloadMUSTbepaddedtoprovideanintegernumberofoctetsasthepayloadlength.ThesepaddingbitsareLSBalignedinnetworkbyteorderandconsistofa0followedbyallones(untiltheendoftheoctet).Thispaddingisonlyrequiredforthelastframeinthepacket,andonlytoensurethepacketcontentsendsonanoctetboundary. 3.2.1ExampleSpeexpacket IntheexamplebelowwehaveasingleSpeexframewith5bitsofpaddingtoensurethepacketsizefallsonanoctetboundary.012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier| CIETFRTPPROFILE39 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page4] March3,2004 0123 01234567890123456789012345678901+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|01111|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3.4MultipleSpeexframesinaRTPpacket BelowisanexampleoftwoSpeexframescontainedwithinoneRTPpacket.TheSpeexframelengthinthisexamplefallonanoctetboundarysothereisnopadding. Speexcodecs[11]areabletodetectthethebitratefromthepayloadandareresponsiblefordetectingthe20msecboundariesbetweeneachframe. 012301234567890123456789012345678901+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X|CC|M|PT|sequencenumber|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|timestamp|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|synchronizationsource(SSRC)identifier|+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+|contributingsource(CSRC)identifiers||...|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CIETFRTPPROFILE40 |..speexdata..|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|..speexdata..|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.MIMEregistrationofSpeex FulldefinitionoftheMIMEtypeforSpeexwillbepartoftheOggVorbisMIMEtypedefinitionapplication[10].MIMEmediatypename:audioMIMEsubtype:speex Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page5] March3,2004 Optionalparameters: Requiredparameters:tobeincludedintheOggMIMEspecification.Encodingconsiderations: SecurityConsiderations: SeeSection6ofRFC3047.Interoperabilityconsiderations:nonePublishedspecification: Applicationswhichusethismediatype:Additionalinformation:none Person&emailaddresstocontactforfurtherinformation:GregHerlein Jean-MarcValin Author/Changecontroller: Author:GregHerlein Changecontroller:GregHerlein CIETFRTPPROFILE41 ThistransporttypesignifiesthatthecontentistobeinterpretedaccordingtothisdocumentifthecontentsaretransmittedoverRTP.ShouldthistransporttypeappearoveralosslessstreamingprotocolsuchasTCP,thecontentencapsulationshouldbeinterpretedasanOggStreaminaccordancewithRFC3534,withtheexceptionthatthecontentoftheOggStreammaybeassumedtobeSpeexaudioandSpeexaudioonly. 5.SDPusageofSpeex WhenconveyinginformationbySDP[4],theencodingnameMUSTbesetto\"speex\".AnexampleofthemediarepresentationinSDPforofferingasinglechannelofSpeexat8000samplespersecondmightbe: m=audio8088RTP/AVP97a=rtpmap:97speex/8000 NotethattheRTPpayloadtypecodeof97isdefinedinthismediadefinitiontobe’mapped’tothespeexcodecatan8kHzsamplingfrequencyusingthe’a=rtpmap’line.Anynumberfrom96to127couldhavebeenchosen(theallowedrangefordynamictypes). Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page6] March3,2004 Thevalueofthesamplingfrequencyistypically8000fornarrowband operation,16000forwidebandoperation,and32000forultra-widebandoperation. Ifforsomereasontheoffererhasbandwidthlimitations,theclientmayusethe\"b=\"header,asexplainedinSDP[4].Thefollowingexampleillustratesthecasewheretheofferercannotreceivemorethan10kbit/s. m=audio8088RTP/AVP97b=AS:10 a=rtmap:97speex/8000 Inthiscase,iftheremotepartagrees,itshouldconfigureits CIETFRTPPROFILE42 Speexencodersothatitdoesnotusemodesthatproducemorethan10kbit/s.Notethatthe\"b=\"constraintalsoappliesonallpayloadtypesthatmaybeproposedinthemedialine(\"m=\").AnotherwaytomakerecommendationstotheremoteSpeexencoderistouseitsspecificparametersviathea=fmtp:directive.Thefollowingparametersaredefinedforuseinthisway: ptime:durationofeachpacketinmilliseconds. sr:ebw: actualsamplerateinHz. encodingbandwidth-either’narrow’or’wide’or ’ultra’(correspondstonominal8000,16000,and 32000Hzsamplingrates). vbr:variablebitrate-either’on’’off’or’vad’(defaultstooff).Ifon,variablebitrateisenabled.Ifoff,disabled.Ifsetto’vad’then constantbitrateisusedbutsilencewillbeencodedwithspecialshortframestoindicatealackofvoiceforthatperiod. cng:comfortnoisegeneration-either’on’or’off’.Ifoffthensilenceframeswillbesilent;if’on’thenthoseframeswillbefilledwithcomfortnoise.mode: Speexencodingmode.Canbe{1,2,3,4,5,6,any} defaultsto3innarrowband,6inwideandultra-wide. penh:useofperceptualenhancement.1indicates tothedecoderthatperceptualenhancementisrecommended,0indicatesthatitisnot.Defaultstoon(1). Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page7] March3,2004 Examples: CIETFRTPPROFILE43 m=audio8008RTP/AVP97a=rtpmap:97speex/8000a=fmtp:97mode=4 ThisexamplesillustrateanoffererthatwishestoreceiveaSpeexstreamat8000Hz,butonlyusingspeexmode3.Theofferermaysuggesttotheremotedecodertoactivateitsperceptualenhancementfilterlikethis:m=audio8088RTP/AVP97a=rtmap:97speex/8000a=fmtp:97penh=1 SeveralSpeexspecificparameterscanbegiveninasinglea=fmtplineprovidedthattheyareseparatedbyasemi-colon:a=fmtp:97mode=any;penh=1 Theofferermayindicatethatitwishestosendvariablebitrateframeswithcomfortnoise:m=audio8088RTP/AVP97a=rtmap:97speex/8000a=fmtp:97vbr=on;cng=on The\"ptime\"attributeisusedtodenotethepacketizationinterval(ie,howmanymillisecondsofaudioisencodedina singleRTPpacket).SinceSpeexuses20msecframes,ptimevaluesofmultiplesof20denotemultipleSpeexframesperpacket.Valuesofptimewhicharenotmultiplesof20MUSTbeignoredandclientsMUSTusethedefaultvalueof20instead. Intheexamplebelowtheptimevalueissetto40,indicatingthatthereare2framesineachpacket.m=audio8008RTP/AVP97a=rtpmap:97speex/8000a=ptime:40 Notethattheptimeparameterappliestoallpayloadslisted inthemedialineandisnotusedaspartofana=fmtpdirective.Valuesofptimenotmultipleof20msecaremeaningless,sothereceiverofsuchptimevaluesMUSTignorethem.IfduringthelifeofanRTPsessiontheptimevaluechanges,whenthereare CIETFRTPPROFILE44 multipleSpeexframesforexample,theSDPvaluemustalsoreflectthenewvalue. Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page8] March3,2004 Caremustbetakenwhensettingthevalueofptimesothatthe RTPpacketsizedoesnotexceedthepathMTU. 6.ITUH.323/H.245UseofSpeex ApplicationisunderwaytomakeSpeexastandardITUcodec. However,untilthatisfinalized,SpeexMAYbeusedinH.323[6]byusinganon-standardcodecblockdefinitionintheH.245[7]codeccapabilitynegotiations. 6.1NonStandardMessageformat ForSpeexuseinH.245[7]basedsystems,thefieldsintheNonStandardMessageshouldbe: t35CountryCode=Hex:B5t35Extension=Hex:00manufacturerCode=Hex:0026 [LengthoftheBinarySequence(8bitnumber)] [BinarySequenceconsistingofanASCIIstring,noNULLterminator]Thebinarysequenceisanasciistringmerelyforeaseofuse.Thestringisnotnullterminated.Theformatofthisstringis speex[optionalvariables] TheoptionalvariablesareidenticaltothoseusedfortheSDPa=fmtpstringsdiscussedinsection5above.Thestringisbuilttobeallononeline,eachkey-valuepairseparatedbya semi-colon.TheoptionalvariablesMAYbeomitted,whichcausesthedefaultvaluestobeassumed.Theyare: ebw=narrow;mode=3;vbr=off;cng=off;ptime=20;sr=8000;penh=no; CIETFRTPPROFILE45 Thefifthbyteoftheblockisthelengthofthebinarysequence.NOTE:thismethodcanresultintheadvertisingofalargenumberofSpeex’codecs’basedonthenumberofvariablespossible.FormostVoIPapplications,useofthedefaultbinarysequenceof ’speex’isRECOMMENDEDtobeusedinadditiontoallotheroptions.ThismaximizesthechancesthattwoH.323basedapplicationsthatsupportSpeexcanfindamutualcodec. 6.2RTPPayloadTypes DynamicpayloadtypecodesMUSTbenegotiated’out-of-band’fortheassignmentofadynamicpayloadtypefromtherangeof96-127.H.323applicationsMUSTusetheH.245H2250LogicalChannelParametersencodingtoaccomplishthis.Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page9] March3,2004 7.SecurityConsiderations RTPpacketsusingthepayloadformatdefinedinthisspecification aresubjecttothesecurityconsiderationsdiscussedintheRTPspecification[2],andanyappropriateRTPprofile.Thisimpliesthatconfidentialityofthemediastreamsisachievedbyencryption.Becausethedatacompressionusedwiththispayloadformatisappliedend-to-end,encryptionmaybeperformedaftercompressionsothereisnoconflictbetweenthetwooperations. Apotentialdenial-of-servicethreatexistsfordataencodingsusingcompressiontechniquesthathavenon-uniformreceiver-end computationalload.Theattackercaninjectpathologicaldatagramsintothestreamwhicharecomplextodecodeandcausethereceivertobeoverloaded.However,thisencodingdoesnotexhibitanysignificantnon-uniformity. AswithanyIP-basedprotocol,insomecircumstancesareceivermaybeoverloadedsimplybythereceiptoftoomanypackets,eitherdesiredorundesired.Network-layerauthenticationmaybeusedtodiscardpacketsfromundesiredsources,buttheprocessingcostoftheauthenticationitselfmaybetoohigh. CIETFRTPPROFILE46 8.NormativeReferences 1. Bradner,S.,\"TheInternetStandardsProcess--Revision3\BCP9,RFC2026,October1996. Schulzrinne,H.,Casner,S.,Frederick,R.andV.Jacobson,\"RTP:ATransportProtocolforreal-timeapplications\RFC1889,January1996. Freed,N.andN.Borenstein,\"MultipurposeInternetMail Extensions(MIME)PartOne:FormatofInternetMessageBodies\RFC2045,November1996. Handley,M.andV.Jacobson,\"SDP:SessionDescriptionProtocol\RFC2327,April1998. Bradner,S.,\"KeywordsforuseinRFCstoIndicateRequirementLevels\BCP14,RFC2119,March1997. ITU-TRecommendationH.323.\"Packet-basedMultimediaCommunicationsSystems,\"1998. ITU-TRecommendationH.245(1998),\"ControlofcommunicationsbetweenVisualTelephoneSystemsandTerminalEquipment\".RTP:Atransportprotocolforreal-timeapplications.Workinprogress,draft-ietf-avt-rtp-new-12.txt. [Page10]March3,2004 2. 3. 4. 5. 6. 7. 8. Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt 9. RTPProfileforAudioandVideoConferenceswithMinimal Control.Workinprogress,draft-ietf-avt-profile-new-13.txt. 10.L.Walleij,\"Theapplication/oggMediaType\RFC3534,May 2003. 8.1InformativeReferences 11.Speexenc/speexdec,referencecommand-lineencoder/decoder, Speexwebsite,http://www.speex.org/12.CELP,U.S.FederalStandard1016. NationalTechnical CIETFRTPPROFILE InformationService(NTIS)website,http://www.ntis.gov/ 47 9.Acknowledgments TheauthorswouldliketothankEquivalencePtyLtdofAustraliafortheirassistanceinattemptingtostandardizetheuseofSpeexinH.323applications,andforimplementingSpeexintheiropensourceOpenH323stack.TheauthorswouldalsoliketothankBrianC.Wiles TheauthorswouldalsoliketothankthefollowingmembersoftheSpeexandAVTcommunitiesfortheirinput:RossFinlayson, FedericoMontesinoPouzols,HenningSchulzrinne,MagnusWesterlund. 10.Author’sAddress GregHerlein Jean-MarcValin Sherbrooke,Quebec,Canada,J1K2R1 SimonMORLAT Herlein,Valin,et.al.ExpiresSeptember3,2004^L Internet-Draftdraft-herlein-avt-rtp-speex-00.txt [Page11]March3,2004 CIETFRTPPROFILE RogerHardiman GloucestershireGL516NREngland 48 PhilKerr 10.FullCopyrightStatement Copyright(C)TheInternetSociety(2003). AllRightsReserved. Thisdocumentandtranslationsofitmaybecopiedandfurnishedtoothers,andderivativeworksthatcommentonorotherwiseexplainitorassistinitsimplementationmaybeprepared,copied,publishedanddistributed,inwholeorinpart,withoutrestrictionofany kind,providedthattheabovecopyrightnoticeandthisparagraphareincludedonallsuchcopiesandderivativeworks.However,thisdocumentitselfmaynotbemodifiedinanyway,suchasbyremovingthecopyrightnoticeorreferencestotheInternetSocietyorotherInternetorganizations,exceptasneededforthepurposeofdevelopingInternetstandardsinwhichcasetheproceduresforcopyrightsdefinedintheInternetStandardsprocessmustbe followed,orasrequiredtotranslateitintolanguagesotherthanEnglish. ThelimitedpermissionsgrantedaboveareperpetualandwillnotberevokedbytheInternetSocietyoritssuccessorsorassigns.Thisdocumentandtheinformationcontainedhereinisprovidedonan\"ASIS\"basisandTHEINTERNETSOCIETYANDTHEINTERNETENGINEERINGTASKFORCEDISCLAIMSALLWARRANTIES,EXPRESSORIMPLIED,INCLUDINGBUTNOTLIMITEDTOANYWARRANTYTHATTHEUSEOFTHEINFORMATIONHEREINWILLNOTINFRINGEANYRIGHTSORANYIMPLIEDWARRANTIESOFMERCHANTABILITYORFITNESSFORAPARTICULARPURPOSE. Acknowledgement CIETFRTPPROFILE49 FundingfortheRFCEditorfunctioniscurrentlyprovidedbytheInternetSociety. Herlein,Valin,et.al.^L ExpiresSeptember3,2004[Page12] DSPEEXLICENSE50 DSpeexLicense Redistributionanduseinsourceandbinaryforms,withorwithoutmodification,arepermittedprovidedthatthefollowingconditionsaremet: Redistributionsofsourcecodemustretaintheabovecopyrightnotice,thislistofconditionsandthefollowingdisclaimer. Redistributionsinbinaryformmustreproducetheabovecopyrightnotice,thislistofconditionsandthefollowingdisclaimerinthedocumentationand/orothermaterialsprovidedwiththedistribution. NeitherthenameoftheXiph.orgFoundationnorthenamesofitscontributorsmaybeusedtoendorseorpromoteproductsderivedfromthissoftwarewithoutspecificpriorwrittenpermission. THISSOFTWAREISPROVIDEDBYTHECOPYRIGHTHOLDERSANDCON-TRIBUTORS“ASIS”ANDANYEXPRESSORIMPLIEDWARRANTIES,IN-CLUDING,BUTNOTLIMITEDTO,THEIMPLIEDWARRANTIESOFMER-CHANTABILITYANDFITNESSFORAPARTICULARPURPOSEAREDISCLAIMED.INNOEVENTSHALLTHEFOUNDATIONORCONTRIBUTORSBELIABLEFORANYDIRECT,INDIRECT,INCIDENTAL,SPECIAL,EXEMPLARY,ORCON-SEQUENTIALDAMAGES(INCLUDING,BUTNOTLIMITEDTO,PROCURE-MENTOFSUBSTITUTEGOODSORSERVICES;LOSSOFUSE,DATA,ORPROF-ITS;ORBUSINESSINTERRUPTION)HOWEVERCAUSEDANDONANYTHE-ORYOFLIABILITY,WHETHERINCONTRACT,STRICTLIABILITY,ORTORT(INCLUDINGNEGLIGENCEOROTHERWISE)ARISINGINANYWAYOUTOFTHEUSEOFTHISSOFTWARE,EVENIFADVISEDOFTHEPOSSIBILITYOFSUCHDAMAGE. EGNUFREEDOCUMENTATIONLICENSE51 EGNUFreeDocumentationLicense Version1.1,March2000 Copyright(C)2000FreeSoftwareFoundation,Inc.59TemplePlace,Suite330,Boston,MA02111-1307USAEveryoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,butchangingitisnotallowed. 0.PREAMBLE ThepurposeofthisLicenseistomakeamanual,textbook,orotherwrittendocument\"free\"inthesenseoffreedom:toassureeveryonetheeffectivefreedomtocopyandredistributeit,withorwithoutmodifyingit,eithercommerciallyornoncommercially.Secondarily,thisLicensepreservesfortheauthorandpublisherawaytogetcreditfortheirwork,whilenotbeingconsideredresponsibleformodificationsmadebyothers.ThisLicenseisakindof\"copyleft\whichmeansthatderivativeworksofthedocumentmustthemselvesbefreeinthesamesense.ItcomplementstheGNUGeneralPublicLicense,whichisacopyleftlicensedesignedforfreesoftware. WehavedesignedthisLicenseinordertouseitformanualsforfreesoftware,becausefreesoftwareneedsfreedocumentation:afreeprogramshouldcomewithmanualsprovidingthesamefreedomsthatthesoftwaredoes.ButthisLicenseisnotlimitedtosoftwaremanuals;itcanbeusedforanytextualwork,regardlessofsub-jectmatterorwhetheritispublishedasaprintedbook.WerecommendthisLicenseprincipallyforworkswhosepurposeisinstructionorreference. 1.APPLICABILITYANDDEFINITIONS ThisLicenseappliestoanymanualorotherworkthatcontainsanoticeplacedbythecopyrightholdersayingitcanbedistributedunderthetermsofthisLicense.The\"Document\below,referstoanysuchmanualorwork.Anymemberofthepublicisalicensee,andisaddressedas\"you\". A\"ModifiedVersion\"oftheDocumentmeansanyworkcontainingtheDocumentoraportionofit,eithercopiedverbatim,orwithmodificationsand/ortranslatedintoanotherlanguage. A\"SecondarySection\"isanamedappendixorafront-mattersectionoftheDoc-umentthatdealsexclusivelywiththerelationshipofthepublishersorauthorsoftheDocumenttotheDocument’soverallsubject(ortorelatedmatters)andcontainsnoth-ingthatcouldfalldirectlywithinthatoverallsubject.(Forexample,iftheDocumentisinpartatextbookofmathematics,aSecondarySectionmaynotexplainanymathe-matics.)Therelationshipcouldbeamatterofhistoricalconnectionwiththesubjectorwithrelatedmatters,oroflegal,commercial,philosophical,ethicalorpoliticalpositionregardingthem. The\"InvariantSections\"arecertainSecondarySectionswhosetitlesaredesig-nated,asbeingthoseofInvariantSections,inthenoticethatsaysthattheDocumentisreleasedunderthisLicense. The\"CoverTexts\"arecertainshortpassagesoftextthatarelisted,asFront-CoverTextsorBack-CoverTexts,inthenoticethatsaysthattheDocumentisreleasedunder EGNUFREEDOCUMENTATIONLICENSE52 thisLicense. A\"Transparent\"copyoftheDocumentmeansamachine-readablecopy,repre-sentedinaformatwhosespecificationisavailabletothegeneralpublic,whosecon-tentscanbeviewedandediteddirectlyandstraightforwardlywithgenerictexteditorsor(forimagescomposedofpixels)genericpaintprogramsor(fordrawings)somewidelyavailabledrawingeditor,andthatissuitableforinputtotextformattersorforautomatictranslationtoavarietyofformatssuitableforinputtotextformatters.AcopymadeinanotherwiseTransparentfileformatwhosemarkuphasbeendesignedtothwartordiscouragesubsequentmodificationbyreadersisnotTransparent.Acopythatisnot\"Transparent\"iscalled\"Opaque\". ExamplesofsuitableformatsforTransparentcopiesincludeplainASCIIwithout Amarkup,Texinfoinputformat,LTEXinputformat,SGMLorXMLusingapublicly availableDTD,andstandard-conformingsimpleHTMLdesignedforhumanmodifi-cation.OpaqueformatsincludePostScript,PDF,proprietaryformatsthatcanbereadandeditedonlybyproprietarywordprocessors,SGMLorXMLforwhichtheDTDand/orprocessingtoolsarenotgenerallyavailable,andthemachine-generatedHTMLproducedbysomewordprocessorsforoutputpurposesonly. The\"TitlePage\"means,foraprintedbook,thetitlepageitself,plussuchfollowingpagesasareneededtohold,legibly,thematerialthisLicenserequirestoappearinthetitlepage.Forworksinformatswhichdonothaveanytitlepageassuch,\"TitlePage\"meansthetextnearthemostprominentappearanceofthework’stitle,precedingthebeginningofthebodyofthetext. 2.VERBATIMCOPYING YoumaycopyanddistributetheDocumentinanymedium,eithercommerciallyornoncommercially,providedthatthisLicense,thecopyrightnotices,andthelicensenoticesayingthisLicenseappliestotheDocumentarereproducedinallcopies,andthatyouaddnootherconditionswhatsoevertothoseofthisLicense.Youmaynotusetechnicalmeasurestoobstructorcontrolthereadingorfurthercopyingofthecopiesyoumakeordistribute.However,youmayacceptcompensationinexchangeforcopies.Ifyoudistributealargeenoughnumberofcopiesyoumustalsofollowtheconditionsinsection3. Youmayalsolendcopies,underthesameconditionsstatedabove,andyoumaypubliclydisplaycopies. 3.COPYINGINQUANTITY IfyoupublishprintedcopiesoftheDocumentnumberingmorethan100,andtheDoc-ument’slicensenoticerequiresCoverTexts,youmustenclosethecopiesincoversthatcarry,clearlyandlegibly,alltheseCoverTexts:Front-CoverTextsonthefrontcover,andBack-CoverTextsonthebackcover.Bothcoversmustalsoclearlyandlegiblyidentifyyouasthepublisherofthesecopies.Thefrontcovermustpresentthefulltitlewithallwordsofthetitleequallyprominentandvisible.Youmayaddothermate-rialonthecoversinaddition.Copyingwithchangeslimitedtothecovers,aslongas EGNUFREEDOCUMENTATIONLICENSE53 theypreservethetitleoftheDocumentandsatisfytheseconditions,canbetreatedasverbatimcopyinginotherrespects. Iftherequiredtextsforeithercoveraretoovoluminoustofitlegibly,youshouldputthefirstoneslisted(asmanyasfitreasonably)ontheactualcover,andcontinuetherestontoadjacentpages. IfyoupublishordistributeOpaquecopiesoftheDocumentnumberingmorethan100,youmusteitherincludeamachine-readableTransparentcopyalongwitheachOpaquecopy,orstateinorwitheachOpaquecopyapublicly-accessiblecomputer-networklocationcontainingacompleteTransparentcopyoftheDocument,freeofaddedmaterial,whichthegeneralnetwork-usingpublichasaccesstodownloadanony-mouslyatnochargeusingpublic-standardnetworkprotocols.Ifyouusethelatterop-tion,youmusttakereasonablyprudentsteps,whenyoubegindistributionofOpaquecopiesinquantity,toensurethatthisTransparentcopywillremainthusaccessibleatthestatedlocationuntilatleastoneyearafterthelasttimeyoudistributeanOpaquecopy(directlyorthroughyouragentsorretailers)ofthateditiontothepublic. Itisrequested,butnotrequired,thatyoucontacttheauthorsoftheDocumentwellbeforeredistributinganylargenumberofcopies,togivethemachancetoprovideyouwithanupdatedversionoftheDocument. 4.MODIFICATIONS YoumaycopyanddistributeaModifiedVersionoftheDocumentundertheconditionsofsections2and3above,providedthatyoureleasetheModifiedVersionunderpre-ciselythisLicense,withtheModifiedVersionfillingtheroleoftheDocument,thuslicensingdistributionandmodificationoftheModifiedVersiontowhoeverpossessesacopyofit.Inaddition,youmustdothesethingsintheModifiedVersion: A.UseintheTitlePage(andonthecovers,ifany)atitledistinctfromthatoftheDocument,andfromthoseofpreviousversions(whichshould,iftherewereany,belistedintheHistorysectionoftheDocument).Youmayusethesametitleasapreviousversioniftheoriginalpublisherofthatversiongivespermission.B.ListontheTitlePage,asauthors,oneormorepersonsorentitiesresponsibleforauthorshipofthemodificationsintheModifiedVersion,togetherwithatleastfiveoftheprincipalauthorsoftheDocument(allofitsprincipalauthors,ifithaslessthanfive). C.StateontheTitlepagethenameofthepublisheroftheModifiedVersion,asthepublisher. D.PreserveallthecopyrightnoticesoftheDocument. E.Addanappropriatecopyrightnoticeforyourmodificationsadjacenttotheothercopyrightnotices. F.Include,immediatelyafterthecopyrightnotices,alicensenoticegivingthepublicpermissiontousetheModifiedVersionunderthetermsofthisLicense,intheformshownintheAddendumbelow. EGNUFREEDOCUMENTATIONLICENSE54 G.PreserveinthatlicensenoticethefulllistsofInvariantSectionsandrequiredCoverTextsgivenintheDocument’slicensenotice.H.IncludeanunalteredcopyofthisLicense. I.Preservethesectionentitled\"History\anditstitle,andaddtoitanitemstatingatleastthetitle,year,newauthors,andpublisheroftheModifiedVersionasgivenontheTitlePage.Ifthereisnosectionentitled\"History\"intheDocument,createonestatingthetitle,year,authors,andpublisheroftheDocumentasgivenonitsTitlePage,thenaddanitemdescribingtheModifiedVersionasstatedintheprevioussentence. J.Preservethenetworklocation,ifany,givenintheDocumentforpublicaccesstoaTransparentcopyoftheDocument,andlikewisethenetworklocationsgivenintheDocumentforpreviousversionsitwasbasedon.Thesemaybeplacedinthe\"History\"section.Youmayomitanetworklocationforaworkthatwaspub-lishedatleastfouryearsbeforetheDocumentitself,oriftheoriginalpublisheroftheversionitreferstogivespermission. K.Inanysectionentitled\"Acknowledgements\"or\"Dedications\preservethesection’stitle,andpreserveinthesectionallthesubstanceandtoneofeachofthecontributoracknowledgementsand/ordedicationsgiventherein. L.PreservealltheInvariantSectionsoftheDocument,unalteredintheirtextandintheirtitles.Sectionnumbersortheequivalentarenotconsideredpartofthesectiontitles. M.Deleteanysectionentitled\"Endorsements\".Suchasectionmaynotbein-cludedintheModifiedVersion. N.Donotretitleanyexistingsectionas\"Endorsements\"ortoconflictintitlewithanyInvariantSection. IftheModifiedVersionincludesnewfront-mattersectionsorappendicesthatqualifyasSecondarySectionsandcontainnomaterialcopiedfromtheDocument,youmayatyouroptiondesignatesomeorallofthesesectionsasinvariant.Todothis,addtheirtitlestothelistofInvariantSectionsintheModifiedVersion’slicensenotice.Thesetitlesmustbedistinctfromanyothersectiontitles. Youmayaddasectionentitled\"Endorsements\provideditcontainsnothingbutendorsementsofyourModifiedVersionbyvariousparties–forexample,statementsofpeerrevieworthatthetexthasbeenapprovedbyanorganizationastheauthoritativedefinitionofastandard. YoumayaddapassageofuptofivewordsasaFront-CoverText,andapassageofupto25wordsasaBack-CoverText,totheendofthelistofCoverTextsintheModifiedVersion.OnlyonepassageofFront-CoverTextandoneofBack-CoverTextmaybeaddedby(orthrougharrangementsmadeby)anyoneentity.IftheDocumentalreadyincludesacovertextforthesamecover,previouslyaddedbyyouorbyarrange-mentmadebythesameentityyouareactingonbehalfof,youmaynotaddanother; EGNUFREEDOCUMENTATIONLICENSE55 butyoumayreplacetheoldone,onexplicitpermissionfromthepreviouspublisherthataddedtheoldone. Theauthor(s)andpublisher(s)oftheDocumentdonotbythisLicensegiveper-missiontousetheirnamesforpublicityforortoassertorimplyendorsementofanyModifiedVersion. 5.COMBININGDOCUMENTS YoumaycombinetheDocumentwithotherdocumentsreleasedunderthisLicense,underthetermsdefinedinsection4aboveformodifiedversions,providedthatyouincludeinthecombinationalloftheInvariantSectionsofalloftheoriginaldocuments,unmodified,andlistthemallasInvariantSectionsofyourcombinedworkinitslicensenotice. ThecombinedworkneedonlycontainonecopyofthisLicense,andmultipleiden-ticalInvariantSectionsmaybereplacedwithasinglecopy.IftherearemultipleIn-variantSectionswiththesamenamebutdifferentcontents,makethetitleofeachsuchsectionuniquebyaddingattheendofit,inparentheses,thenameoftheoriginalau-thororpublisherofthatsectionifknown,orelseauniquenumber.MakethesameadjustmenttothesectiontitlesinthelistofInvariantSectionsinthelicensenoticeofthecombinedwork. Inthecombination,youmustcombineanysectionsentitled\"History\"inthevari-ousoriginaldocuments,formingonesectionentitled\"History\";likewisecombineanysectionsentitled\"Acknowledgements\andanysectionsentitled\"Dedications\".Youmustdeleteallsectionsentitled\"Endorsements.\" 6.COLLECTIONSOFDOCUMENTS YoumaymakeacollectionconsistingoftheDocumentandotherdocumentsreleasedunderthisLicense,andreplacetheindividualcopiesofthisLicenseinthevariousdocumentswithasinglecopythatisincludedinthecollection,providedthatyoufollowtherulesofthisLicenseforverbatimcopyingofeachofthedocumentsinallotherrespects. Youmayextractasingledocumentfromsuchacollection,anddistributeitindivid-uallyunderthisLicense,providedyouinsertacopyofthisLicenseintotheextracteddocument,andfollowthisLicenseinallotherrespectsregardingverbatimcopyingofthatdocument. 7.AGGREGATIONWITHINDEPENDENTWORKS AcompilationoftheDocumentoritsderivativeswithotherseparateandindependentdocumentsorworks,inoronavolumeofastorageordistributionmedium,doesnotasawholecountasaModifiedVersionoftheDocument,providednocompilationcopyrightisclaimedforthecompilation.Suchacompilationiscalledan\"aggregate\andthisLicensedoesnotapplytotheotherself-containedworksthuscompiledwiththeDocument,onaccountoftheirbeingthuscompiled,iftheyarenotthemselvesderivativeworksoftheDocument. EGNUFREEDOCUMENTATIONLICENSE56 IftheCoverTextrequirementofsection3isapplicabletothesecopiesoftheDoc-ument,theniftheDocumentislessthanonequarteroftheentireaggregate,theDocu-ment’sCoverTextsmaybeplacedoncoversthatsurroundonlytheDocumentwithintheaggregate.Otherwisetheymustappearoncoversaroundthewholeaggregate. 8.TRANSLATION Translationisconsideredakindofmodification,soyoumaydistributetranslationsoftheDocumentunderthetermsofsection4.ReplacingInvariantSectionswithtrans-lationsrequiresspecialpermissionfromtheircopyrightholders,butyoumayincludetranslationsofsomeorallInvariantSectionsinadditiontotheoriginalversionsoftheseInvariantSections.YoumayincludeatranslationofthisLicenseprovidedthatyoualsoincludetheoriginalEnglishversionofthisLicense.IncaseofadisagreementbetweenthetranslationandtheoriginalEnglishversionofthisLicense,theoriginalEnglishversionwillprevail. 9.TERMINATION Youmaynotcopy,modify,sublicense,ordistributetheDocumentexceptasexpresslyprovidedforunderthisLicense.Anyotherattempttocopy,modify,sublicenseordistributetheDocumentisvoid,andwillautomaticallyterminateyourrightsunderthisLicense.However,partieswhohavereceivedcopies,orrights,fromyouunderthisLicensewillnothavetheirlicensesterminatedsolongassuchpartiesremaininfullcompliance. 10.FUTUREREVISIONSOFTHISLICENSE TheFreeSoftwareFoundationmaypublishnew,revisedversionsoftheGNUFreeDocumentationLicensefromtimetotime.Suchnewversionswillbesimilarinspirittothepresentversion,butmaydifferindetailtoaddressnewproblemsorconcerns.Seehttp://www.gnu.org/copyleft/. EachversionoftheLicenseisgivenadistinguishingversionnumber.IftheDocu-mentspecifiesthataparticularnumberedversionofthisLicense\"oranylaterversion\"appliestoit,youhavetheoptionoffollowingthetermsandconditionseitherofthatspecifiedversionorofanylaterversionthathasbeenpublished(notasadraft)bytheFreeSoftwareFoundation.IftheDocumentdoesnotspecifyaversionnumberofthisLicense,youmaychooseanyversioneverpublished(notasadraft)bytheFreeSoftwareFoundation. Index ACELP,28 algorithmicdelay,8 analysis-by-synthesis,19API,11 auto-correlation,18averagebit-rate,7,14bit-rate,23 CELP,6,17 complexity,6,7,22,23constantbit-rate,7 discontinuoustransmission,8,14DTMF,7,28errorweighting,19in-bandsignalling,15Levinson-Durbin,18libspeex,11 linespectralpair,19,21linearprediction,17,21meanopinionscore,22music,27 narrowband,6,7,21Ogg,16,26 open-source,6,26 patent,6,26 perceptualenhancement,8,13,23pitch,19,21 quadraturemirrorfilter,24quality,7RTP,16samplingrate,7speexdec,10speexenc,9 standards,16ultra-wideband,7 variablebit-rate,6,7,13 voiceactivitydetection,6,8,14Vorbis,26wideband,6,7,24 57 因篇幅问题不能全部显示,请点此查看更多更全内容