Discriminative Clustering for Market Segmentation

2020-02-28 来源：好走旅游网

DiscriminativeClusteringforMarketSegmentation

PeterHaider

∗

Dep.ofComputerScienceUniv.ofPotsdam,Germany

haider@cs.uni-potsdam.de

chiarluc@yahoo-inc.com

WebResearchGroupUniversitatPompeuFabra

Barcelona,Spain

LucaChiarandini

†

brefeld@uni-bonn.de

UniversityofBonnBonn,Germany

UlfBrefeld

ABSTRACT

Westudydiscriminativeclusteringformarketsegmentationtasks.Theunderlyingproblemsettingresemblesdiscrimi-nativeclustering,however,existingapproachesfocusonthepredictionofunivariateclusterlabels.Bycontrast,marketsegmentsencodecomplex(future)behavioroftheindivid-ualswhichcannotberepresentedbyasinglevariable.Inthispaper,wegeneralizediscriminativeclusteringtostruc-turedandcomplexoutputvariablesthatcanberepresentedasgraphicalmodels.Wedevisetwonovelmethodstojointlylearntheclassiﬁerandtheclusteringusingalternatingop-timizationandcollapsedinference,respectively.Thetwoapproachesjointlylearnadiscriminativesegmentationoftheinputspaceandagenerativeoutputpredictionmodelforeachsegment.WeevaluateourmethodsonsegmentingusernavigationsequencesfromYahoo!News.Theproposedcollapsedalgorithmisobservedtooutperformbaselineap-proachessuchasmixtureofexperts.Weshowcaseexem-plaryprojectionsoftheresultingsegmentstodisplaytheinterpretabilityofthesolutions.

CategoriesandSubjectDescriptors

I.5.3[Clustering]:Algorithms

GeneralTerms

Algorithms,Experimentation

Keywords

DiscriminativeClustering,MarketSegmentation

1.INTRODUCTION

Marketsegmentationrevealsdivisionsinagivenmarket,whereamarketreferstoapopulationofinterestsuchas

criminatedfromeachotherbytheclassiﬁer.Combiningthetwocriteriathusguaranteesconciseclusteringsandaccu-rateclassiﬁers.Existingapproachesfocusonclusteringapopulationandpredictingaclusterlabelforanewinstance.Bycontrast,marketsegmentationismorecomplex.Inmar-ketsegmentationtasks,weneedtodiﬀerentiatebetweenthedatathatcharacterizesindividualsandthedatathatcharac-terizestheirfuturebehavior.Theclusteringclearlyneedstotakeallavailableinformationintoaccounttogeneratemean-ingfulsegments.However,theclassiﬁerdoesnothaveaccesstofutureeventsandneedstotakeadecisionontheavailableinformationsuchasgender,income,etc.Thisobservationrendersexistingapproachestodiscriminativeclusteringtoorestrictiveformarketsegmentationtasks.

Inthispaperwegeneralizediscriminativeclusteringformarketsegmentationtasksusingthestructuredpredictionframework.Wediﬀerentiatebetweenattributesofacus-tomerandherinterests/behavior.Attributesareaprioriavailablefeaturesofindividualsofthepopulationsuchasgenderorincome.Herbehaviorisacollectionofinteract-ingvariablesdescribingasegment.Assegmentsneedtobeinterpretable,wemodeltheoutputdataasacomplexandstructuredvariablewhichcanberepresentedasagraphicalmodel.Thedistinctionallowsforlearningaclassiﬁeronlyontheattributes,computingtheclusteringonbothattributesandbehavior,andﬁnallysummarizingthesegmentsonlyintermsofthebehavior.

Wedevisetwosolutionswhicharebasedontheregular-izedempiricalriskminimizationframework.Theﬁrstisastraightforwardadaptationofmixturesofexperts.Classi-ﬁerandclusteringareoptimizedusinganalternatingstrat-egywhereweﬁxonecomponentwhileoptimizingtheother.Thesecondsolutionusesapproximationsandintegratesoutparametersoftheclassiﬁerusingcollapsedinferenceforeﬃ-ciency.Bothapproachesusegenerativemodelsfortheout-putstructureand,incontrasttoconventionaldiscriminativeclusteringapproaches,donotinvolvetrade-oﬀparametersforclassiﬁcationaccuracyandclusterconsistency(classbal-ance)becausetheoptimizationproblemsarenotpronetotrivialanddegeneratesolutions.

Usecasesofourmethodscontaintraditionalmarketseg-mentationtasks.Considerforinstanceacompanythataimsatpromotinganewproductorahotelchainthatintendstolurevisitorswithspecialoﬀers.Ourmethodsnotonlycom-puteameaningfulsegmentationofthecustomersbutalsoallowfordevisingappropriatetargetingstrategiesfromthegraphicalmodels.Moreover,ourmethodservesasdiscrim-inativeclusteringforstructuredvariables,wherethetaskisnottooutputasingleclass/clusterlabelbuttheaveragestructureforeverysegment.Thediﬀerentiationbetweenattributesandbehaviorincreasestherangeofapplicationsthatcanbeaddressed.Aspecial–butstillnovel–caseisobtainedwhenattributesandbehaviorpartiallyoverlap.Empirically,westudyourmethodsonanotherinterestingusecase:SegmentingusernavigationsessionsontheWebfordisplayingsegment-speciﬁcwebsitelayouts.Weexperi-mentonalargeclicklogfromYahoo!News.Theattributedataisassembledfrommeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andtheﬁrstpagerequest.Thebehaviorconsistsofsubsequentnaviga-tionactionsgivenbyclicksequences.Thegenerativerep-resentationofthebehaviordataisinterpretableandcanbeeasilytransformedintosegment-speciﬁclayouts.

Theremainderofthepaperisstructuredasfollows.Sec-tion2discussestherelationshipofourproblemsettingwithpreviouslystudiedsettingsandmethods.InSection3wederivetwoalgorithmstooptimizetheempiricalcounterpartoftheexpectedsegmentedlog-likelihood.Section4reportsonempiricalresultsusingalargeclicklogfromacommercialnewsproviderandSection5concludes.

2.RELATEDWORK

Marketsegmentationtasksareoftensolvedusingneuralnetworkssuchasself-organizingmaps(13;7).Kiangetal.(13)forinstanceextendself-organizingmapstogroupcus-tomersaccordingtotheirattitudetowardsdiﬀerentcommu-nicationsmodes.D’UrsoanddeGiovanni(7)usethenatu-ralclusteringpropertyofself-organizingmapstogetherwithdissimilaritymeasureswhichcapturetemporalstructureofthedata.Ingeneral,clusteringdatawithself-organizingmapsandvariantsthereofinherentlyimplementsthehomo-geneityassumptionofmarketsegmentation.However,clas-sifyingnewinstancesintotheclusteringisoftendiﬃcultanditisnotpossibletooutputgenerativemodelstosummarizetheresultingclusters.Additionally,theoptimizationcrite-rionofself-organizingmapsishighlysensitivetotheactualinitializationandusuallyconvergestodiﬀerentlocaloptima.Relatedtomarketsegmentationisthetaskofestimatingamixturemodelforobservations(6).Introducingselectorvariablesencodingprobabilisticcluster-memberships,maxi-mizingthelog-likelihoodbymarginalizingovertheselectorisusuallystraightforward.Theselectorcanbemodeledinadata-dependentordata-independentfashionbuttheprob-abilisticnatureofthecluster-membershipsrenderadirectapplicationformarketsegmentationtasksimpossible.

Discriminativeclusteringsimultaneouslycomputesaseg-mentationofthedataathandandaclassiﬁerthatdiscrim-inatestheresultingclusterswell.Existingapproachesin-cludeprojectionsintolower-dimensionalsubspaces(5),jointoptimizationofmax-marginclassiﬁersandclusterings(20;24),theoptimizationofscattermetrics(21),andthemax-imizationofaninformationtheoreticcriteriontobalanceclassseparationandclassiﬁercomplexity(8).Sinkkonenetal.(17)aimtoﬁndclustersthatarehomogeneousinauxiliarydatagivenbyadditionaldiscretevariables.Theabovementionedapproachesdonotpredictanyoutputvari-ablebutfocusonthediscreteclustervariable.Moreover,inourgeneralizedproblemsetting,instancesarerepresentedasinput-outputpairs.Theclassiﬁerdiscriminatestheclustersgivenonlytheinput,whereastheclusterparametersneedtoaccuratelyestimatetheoutputsofthecontainedinstances.Previousworkondiscriminativeclusteringdoesnotsplitin-stancesintotwoparts.Theyrepresentinstancesasasingleinputwhichconsequentlyallowstheclassiﬁerstoaccessthewholeexampleatdecisiontime.Thesameassumptioniscommonlybeingmadeinmarketsegmentationstudiesthatinvolvemodel-basedclusteringapproaches,(9;16;22)butprohibitsanaturalsolutionformarketsegmentationtasks.ThisproblemsettingcanbeseenasanalterationofthesettingwhichtheMixtureofExpertsapproach(10;12)aimstosolve,wherethebehavioryispredictedgiventheat-tributesxasamixturemodelwherethemixturecomponentweightsdependagainonx.Inourcase,mixturecomponentweightshavetobealwayspointdistributionsasdemandedbytheapplication.Framingthedistributionofygiventhemixturecomponentasapuregenerativemodelallowsusto

418

deriveamoreeﬃcientalgorithmthanthatoftheMixtureofExpertsapproach.

Zhaoetal.(23)proposedamaximum-margincluster-ingformultivariatelossfunctions.Minimizingthecomplexlossesallowsforcapturingstructuraldiﬀerencesthatcannotbeexpressedintermsofstandardmisclassiﬁcationrates.Inprinciple,bydeﬁningalossfunctionthatcapturesthediﬀerencesoftwoclusterings,onecouldpossiblysolvemar-ketsegmentationtasksastheirapproachimplicitlyfavorsclusteringsthatareeasilydiscriminable.However,thelossfunctioncannotbeexpressedintermsofthecontingencyta-bleofthetwoclusterings,andthedecodingproblemintheinnerloopofthealgorithm,thatisﬁndingthemostviolatedconstraint,becomesintractableinpractice.

Alsorelatedtoourproblemsettingaremulti-viewclus-teringapproaches,wherethedataissplitintotwodisjointfeaturesets,whicharesometimesalsocalledviews.BickelandScheﬀer(2)presentanintertwinedExpectationMaxi-mizationalgorithmtocomputethemaximumlikelihoodso-lutionforoneviewusingtheexpectationsprovidedbyitspeerview.Thetwodataviewsaremodeledgenerativelyandthealgorithmmaximizesthejointmarginallikelihood.Bycontrast,weaimtoﬁndadiscriminativeclassiﬁerontheinputviewandinsteadofmaximizingthemarginallikeli-hoodoftheoutputviewweseektomaximizethelikelihoodconditionedonahardclusterassignment.

maximizingtheexpectedriskfunctionalRthatisdeﬁnedintermsofthesegmentedlog-likelihood

󰀈

R(h,θ)=logP(y|θh(x))dP(x,y).(1)SincethetruejointdistributionP(x,y)isunknown,were-placeEquation(1)byitsempiricalcounterpartontheﬁnite

marketsampleofsizengivenbyM={(xi,yi)}ni=1

ˆ(θ,h)=R

n󰀆i=1

logP(yi|θh(xi)).

(2)

DirectlymaximizingEquation(2)intermsofthecompo-nentparametersθandtheclassiﬁerhisinfeasible,sincetheobjectivefunctionisnotonlyhighlynon-convexandnon-continuousbutanNP-hardproblembecauseofcombinato-rialassignments.However,iftheclassiﬁerhwasﬁxed,theθjcouldbeoptimizeddirectly,ashprovidesthesegmentation

ˆjaretriviallyandforeachsegmentjoptimalparametersθ

computedby

󰀆

ˆj=argmaxθlogP(yi|θ).(3)

i:h(xi)=j

3.DISCRIMINATIVESEGMENTATION

Wenowpresentourmaincontribution,thegeneralization

ofdiscriminativeclusteringforstructuredoutputvariablestosolvemarketsegmentationproblems.Weintroducetheproblemsettinginthenextsectionandpresentastraightfor-wardsolutionintermsofmixturesofexpertsinSection3.2.AneﬃcientapproximationisdevisedinSection3.3andSec-tion3.3.3discussesscalabilityissues.

Formanycommondistributionfamilies,themaximumlike-lihoodestimatesP(y|θ)canbecomputedeasilybycountingoraveragingovertheobservationsinthesegment,respec-tively.Viceversa,keepingthesegmentparametersθ1,...,θkﬁxed,learningtheclassiﬁerhresultsinastandardmulti-classclassiﬁcationscenario.Usinglinearmodels,hcanbewrittenas

⊤

h(x)=argmaxwjx,

j∈{1,...,k}

(4)

3.1Preliminaries

WearegivenasampleMfromamarketwhereindividuals

arerepresentedbytuples(x,y)∈X×Yencodingattributesxandbehaviory.Attributesxmayencompassindividualfeatureslikegender,income,etcwhiletheexpressedhistoricbehavioriscapturedbyy∈Yandrepresentedasagraph-icalmodel.ThebehaviorsyaregovernedbyafamilyofdistributionsdenotedbyP(y|θ)withnaturalparameterθ.InourrunningexampleonsegmentingusernavigationontheWeb,attributesxencodemeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andtheﬁrstpagerequestandisrepresentedasafeaturevector.ThebehavioryencodessequencesofthesubsequentWebnavigationandcanforinstanceberepresentedasaMarkov-chainwherenodescorrespondtopageviewsandconnectingedgesvisualizeclicks.

Weaimtoﬁndanappropriatesegmentationofthemar-ketM.Formally,thegoalistoﬁndaclassiﬁerh:X→{1,...,k}thatmapsattributesxtooneofkclusters,pa-rameterizedbyθ=(θ1,...,θk),wheretheθjarechosentomaximizethelikelihoodoftheobservedbehaviorsyoftherespectivesegment.Thenumberofclusterskisassumedtobegivenbytheapplicationathand,becauseitconsti-tutesatrade-oﬀbetweenpredictivepowerandeﬀortspentfordevelopingmultiplemarketstrategies.Ashandθarenotindependenttheyneedtobeoptimizedjointly.Hence,overallclassiﬁershandparametercollectionsθ,weaimat

whereeachsegmenthasitsownweightvectorwj.Inthere-mainder,wewillusehandw=(w1,...,wk)⊤interchange-ably.Thenextsectionexploitsthisobservationandpresentsajointalternatingoptimizationscheme.

3.2AnAlternatingOptimizationScheme

Astraightforwardapproachtosolvemarketsegmentationproblemsistoalternatetheoptimizationoftheclassiﬁerandtheclusteringwhileﬁxingtheother,respectively.AsshowninEquation(3),keepingtheclassiﬁerﬁxedallowstoapplystandardmaximumlikelihoodtechniquestocomputethenaturalparametersofthesegments.Wethusfocusonderivingtheclassiﬁerhforaﬁxedclustering.Wemakeuseofthemaximum-marginframeworkanddeployare-scaledvariantofthehingelosstodealwithuncertaincluster-memberships(orclasslabels).

Theideaisasfollows.Intuitively,anindividual(x,y)shouldbeassignedtothesegmentthatrealizesthehigh-estlog-likelihoodwithrespecttoy.However,twoormoresegmentsmightbecompetingfortheinstanceandrealize

1:Input:(x1,y1),...,(xn,yn),λ>0,k>12:Initializeθrandomly3:repeat4:E-step:w←argminw′,ξ≥0λ5:

s.t.

′

(wj∗

i=1ξin

′⊤

−wj)x≥1−ξi

󰀁n

419

similarlog-likelihoods,inwhichcaseawinner-takes-allde-cisionisprohibitive.Wethustreatthediﬀerenceofthelog-likelihoodsbetweenthemostlikelysegmentj∗andclusterj′=j∗asamisclassiﬁcationscore,givenby

s(y)=logP(y|θj∗)−logP(y|θj′).

(5)

Thesescorescanbeincorporatedinasupportvectorma-chinebyre-scalingthehingelossandactlikeexample-de-pendentcosts(3).There-scaledhingelossbecomesaconvexupperboundofthediﬀerenceofthelog-likelihoods,

ℓ(x)=s(y)max󰀂0,1−(wj∗−wj′)⊤x=(ξ󰀃

.(6)

Stackingupw=(w1,...,wk)⊤andξ1,...,ξn)⊤,we

arriveatthefollowingmaximum-marginoptimizationprob-lem

wmin

,ξ≥0

󰀆nξi

i=1

s.t.

(w∗−wj)⊤

x≥1ξi

−whichstillcontainsthemutuallydependent󰀁

j′

exp(ρw⊤

j′xi)

,(7)

variablesθand

w.Toobtainaneﬃcientlysolvableoptimizationproblem,weexpresstheobjectiveasacontinuousfunctionofwsothatwcanbeeliminatedusingcollapsedinference.InsteadofthehingelossinEquation(6),weemployanothertightconvexupperboundintermsofthesquaredloss,

ℓ(x)=(logP(y|θj)−w⊤

jx)2.

Implicitly,introducingthesquaredlossconvertstheclassi-ﬁerintoaregressorthataimsatpredictingthelog-likelihood

foranindividual(x,y)andthej-thsegmentasaccurateaspossible.Assumingthelog-likelihoodswerepredictedperfectly,theparameterswwouldnotonlybeoptimalfortheregressionbutalsoforEquation(2)astheclassiﬁerhinEquation(4)wouldstillreturnthemostlikelysegment.Changingthelossfunctionalsohastheadvantagethatnowtheoptimalsolutionforwcanbecomputedanalytically.Thecorrespondingregularizedoptimizationproblemisalsoknownasregularizedleastsquaresregression(RLSR)orridgeregressionandisgivenby

min

longerdependonwandthathasonlytheθjasfreeparam-eters,

max

n󰀆i=1

log

k󰀆j=1

P(yi|θj)

exp(ρπ(θj)⊤x¯i)

2:3:4:5:6:7:8:9:10:

Input:(x1,y1),...,(xn,yn),λ

ρ←1,t←1,initializeθ(0)randomlyrepeat

E-step:Q(zi=j)←P(zi=j|xi,yi,ρ,λ,θ(t−1))

󰀁󰀁

M-step:θ(t)←argmaxθijQ(zi=j)×

logP(yi,zi=j|xi,ρ,λ,θ)

ρ←ρ×1.1,t←t+1untilconvergenceˆ=argmaxRˆ(θ,α(θ))θθ

ˆ)α←α(θ

icalFollowingriskfunctionalthegeneralinEMEquationframework,󰀁

j′

exp(ρπ(θj)⊤x¯i)

(11)inwetermsexpressoftheempir-expec-tationszi,j.Thisallowsustoeﬀectivelypullthelogarithm

intothesumoversegmentsfortheM-step;wearriveattheoptimizationproblem

maxθ

󰀆󰀆zi,jlog󰀄

P(yi|θj)

exp(ρπ(θj)⊤x¯i)i

󰀁

exp(ρπ(θoldj)⊤x¯i)

aretheTaylorcoeﬃ-

cientsandCisaconstant.SubstitutingEquation(12)intotheobjectivefunctionoftheM-stepandcollectingthecoef-ﬁcientsgivesus

argmax󰀆󰀆logP(yi|θj)󰀄

zi,j+ρ󰀆x¯i′i󰀂zi′,j)−

i′

󰀆

zi′,j′ti′j′

j′

󰀃

󰀅

,(13)

algorithmeﬀectivelyintractable.Wecanalleviatethisbyrandomlypartitioningtheexamplesintheleast-squareses-timationinEquation(8)intosdisjointsubsetsS(1),...,S(s)ofsizem.Foreachsubsettheweightvectorsw(l)areesti-matedseparately,andthuswithineachsubsetthevectorsπ(θj)andthetransformedexamplesx¯haveonlymcompo-nents.Consequently,inEquation(13)theinnersummationovertheexamplesonlyrunsoverthemexamplesinthesubsettowhichexample(xi,yi)belongs.Finally,weobtaintheparametersoftheclassiﬁerhbyaveragingtheweightvectorsoverallsubsets,wj=1

AtypicaltargetingstrategyinourYahoo!NewsexamplecouldforinstancebeadynamiclayoutoftheWebsitetoadvertisenewsarticlesofcategoriesthattherespectiveuserisprobablyinterestedin.

Fromadataperspective,modelingsequencesofclickedcategoriesbyMarkovprocessesisstraightforward.How-ever,Markovprocesses,e.g.,visualizedbytransitionmatri-ces,arediﬃculttointerpretastheentriesencodeinterestswithrespecttothepreviouscategory.TakingtheinferredMarkovmodelproperlyintoaccountwouldimplychangingthewebsitelayoutwithinasessiondependingontheprevi-ouscategory.Asimplerwaytoobtaininterpretableclustersistousemultinomialdistributionsfortheoutputvariablesofinterest.Weusethesequencesofuserclicksenrichedwiththerespectivelocationsoftheclicks.Thatis,thebehavioryconsistsofthemulti-setofsubsequentlyclickedcategoriescandlinksectionss.ThedistributionP(y|θj)isdeﬁnedastheproductofmultinomialdistributions

󰀇󰀇

P(y|θ)=P(ci|µ)P(sj|ν),

whereµandνaretheparametervectorsgoverningthedis-tributionsovercategoriesandlinksections,respectively.

Theattributesxofasessionisrepresentedasabinaryfeaturevectorencodingthemostcommonreferrerdomains,therespectivecategoryoftheﬁrstpageview,aswellasfea-turesencodingthetimestamp;weusebinaryindicatorsforeverydayoftheweek,foreachhouroftheday,andforeachhouroftheweek.Forthecollapsedalgorithm,weusealinearkernelandrandomlypartitionthetrainingdataintodisjointsubsetsofsize1,000forcomputingthepredictedlog-likelihoods.

4.1Baselines

Wecomparethecollapsedalgorithmwiththreebaselines,thealternatingoptimizationschemeinSection3.2,amix-tureofexpertsmodelandak-meansbasedsolution.Themixtureofexpertsmodel(12)minimizesthesquarederrorintermsofthewithin-clusterlog-likelihoodsandoptimizesthemarginallikelihood

󰀆󰀆

logP(yi|θj)P(zi=j|x).

Allprocessingisanonymousandaggregated

Colorsoccurmorethanonceduetothelargenumberofcategories.

ThemixtureofexpertsmodelisoptimizedwithastandardEM-algorithmandthereforeprovidesonlyprobabilisticclus-terassignmentsanddoesnottakeintoaccountthatsessionsneedtobeassignedtoonlyasinglecluster.

Thethirdbaselineisderivedfromthestraightforward,yetsomewhatna¨ıve,approachtosegmenttheinputspaceﬁrstandonlythenoptimizethegenerativemodelineachcluster.Thedrawbackofthisnon-iterativeapproachisthatitdoesgenerallynotleadtohomogeneousbehaviorwithinclustersbecausethesegmentsareﬁxedwhenestimatingthegenera-tivemodels.Weusek-meansforﬁndingtheclustering,andestimatesegmentparamtersθbymaximumlikelihoodbasedonthehardclusterassignments.Theclassiﬁerhclassiﬁesanewinstanceintotheclusterwiththenearestcentroid.

InsteadofthepriorP(z=j)wehaveaconditionaldistribu-tionP(z=j|x)whichisdeﬁnedinanalogytothecollapsedalgorithmas

󰀆

P(z=j|x)∝exp(αj,ik(xi,x)).

422

Ineachsetting,everyalgorithmisdeployed10timeswithrandomparameterinitializationsandintheremainderweonlyreporttheresultsoftherunwithhighesttraininglike-lihood.

4.2Convergence

Inthissection,weevaluatetheconvergencebehaviorof

thecollapsedalgorithm.Recallthatthecollapsedalgorithmoptimizesanapproximateobjective,wherethehardclus-terassignmentsarereplacedbyasoft-maxcontrolledbyanincreasingfactorρ.Tocancelouteﬀectscausedbytheapproximation,wesubstitutetheresultingθintotheex-actoptimizationcriterioninEquation(2)andmeasuretherespectiveobjectivevalue.Notethattheresultsdonotnec-essarilyincreasemonotonically.

Figure3:Averagedpredictiveperformanceandstandarderror.

Figure2:Objectivevaluesforthecollapsedalgo-rithm(solid)andthemixtureofexpertsbaseline(dashed),fordiﬀerentnumbersofclustersk.Figure2showstheresultsfordiﬀerentnumbersofclus-tersforthecollapsedalgorithm(solidcurves).Forcompar-ison,wealsoaddedthemixtureofexpertsbaseline(dashedcurves).Asexpected,thetrueobjectivevalueisnotmono-tonic,sincebothalgorithmsoptimizeanapproximationtotheexactoptimizationcriterion.Theﬁgurealsoshowsthatthebestvaluesareobtainedafteratmost20iterations.

4.3PredictivePerformance

Toevaluatetheperformanceofthecollapsedalgorithm,wemeasureitspredictiveaccuracyintermsofhowwellfu-turebehaviorcanbepredicted.Theclassiﬁerandtheseg-mentationarelearnedjointlyasdescribedinSection3us-ingthetrainingsetandthendeployedtothetestset.Thesessionsinthetestsetareﬁrstclassiﬁedbytheclassiﬁerinoneofthesegmentswhichisthenusedtopredictthefutureclicksoftheuser.Sincetheﬁnalpredictionisacom-plexvariable,werefrainfromexpressingtheperformanceintermsoferrorratesandmeasurethepredictivelog-likelihoodlogP(y|θh(x))instead.Wecomparethecollapsedalgorithmtothealternatingoptimizationscheme,themixtureofex-pertsmodel,andthek-meansbasedsolution.Wereportonaveragesandstandarderrorsover10repetitionswithdiﬀer-entrandominitializations.

Figure3showsthepredictiveperformanceforvaryingnumbersofclusters.Notsurprisingly,allmethodsperform

equallyworseforonlyasinglecluster.Foronlyafewclus-ters,themixtureofexpertsbaselineperformsaboutaswellasthecollapsedalgorithm.Wecreditthisﬁndingtotheexistenceofeasy-to-reachsolutionsthatdonotnecessarilyrequirehardclusterassignmentsintheθ-steps.However,whenthenumberofclustersgrows,theperformanceofthemixtureofexpertsapproachdecreasesslightlywhilethatofthecollapsedmodelincreases.Hereitbecomesmoreandmoreimportanttoselecttheparametersinawaythatal-lowstodiscriminatewellbetweentheclusters,andthusthecollapsedalgorithmoutperformsthebaselinessigniﬁcantly.Thealternatingalgorithmandthek-meansbaselineperformsigniﬁcantlyworsethanthecollapsedalgorithm.Onlyfor20andmoreclustersthealternatingalgorithmproducesbet-terresultsthanthemixtureofexpertsmodel.Notethatthek-meansperformsworstasitdoesnotuseanalternat-ingupdateschemabutﬁrstlearnstheclusteringandthenestimatesthegenerativemodelsusingtheﬁxedsegments.Itisapparentthatthepredictiveperformancelevelsoﬀafterincreasingthenumberofclustersbeyond10.Intu-itively,thisobservationcanbeexplainedbyatrade-oﬀbe-tweenclassiﬁcationandsegmentation:evenifamoreﬁne-grainedclusteringwouldbeabletopredictthefuturebehav-iormoreaccurately,theclassiﬁercannotdiscriminatewellbetweenalargernumberofsimilarclusterstoidentifythebest-matchingsegment.Weobserveanaturaltrade-oﬀbe-tweenpredictivepowerandtheeﬀortthathastobespentfordevelopingandmaintainingtargetstrategiesforalargenumberofmarketsegments.

Theexecutiontimeofthecollapsedalgorithmforasolu-tionwith10clustersiswithintherangeof3hours,comparedtoaboutanhoureachforthemixtureofexpertsandthek-meansbaselines.Thealternatingoptimizationhowevertakesabout11hourswhichrendersitsapplicationinfeasi-bleinpractice.

4.4Discussion

Marketsegmentationaimsatgroupingsimilarindividu-alsofapopulationtogetherthatsharethesameneedsorthathavesimilardemands.Thegoalistotargetindividu-alswithinthesamesegmentjointlye.g.,toadvertiseanewproduct.Tothisend,thesegmentsneedtobeinterpretable

423

Figure5:Clickvolumesofcategoriesovertimeforthefourclusters.

Cluster1Cluster2Cluster3Cluster4Figure4:Visualizationofclickfrequenciesforthe

ﬁvemostfrequentlinklocationsusingfourclusters.

toderiveaconcisedescriptionofthesegmentsthatcanbeconvertedintoasegment-speciﬁctargetingstrategy.

Inourcollapsedalgorithm,generativemodelsineachseg-mentencodethecontainedbehaviorandinterest.Theﬂex-ibilityoftheprobabilisticinferencemachineryallowsustoprojectthebehaviorontodiscriminativevariablestovisu-alizediﬀerentcharacteristicsoftheclusters.Inthissectionwegivetwoexamplesforsuchprojectionstovisualizedif-ferentlydistributeduserbehavioracrosstheclustering.Forsimplicity,weuseasolutionwithfourclusters.

Theﬁrstexampleshowsavisualizationofsegment-speciﬁcuserclicksintermsoftheirlocationontheWebpage.In-cludingthelocationofclicksisnecessaryforalteringthelay-outdynamicallyaschangesinfrequentlyclickedareaswillhaveimpactthebehaviormorethansubstitutingaredun-dantandlessclickedwidget.WefocusontheﬁvemodulesoftheWebsitethatreceivethehighestnumberofclicksinthedata.

Figure4showstheresults.Segments2,3,and4exhibitverysimilarclickbehaviorintermsoftheclickedmodules.Bycontrast,cluster1diﬀerssigniﬁcantlyintheusageoftheWebcomponents.Onaverage,usersincluster1preferthelocationvisualizedinblackoverthealternativescompared

tousersintheothersegments.Thisobservationcouldbeex-ploitedtodirectlydevisetargetstrategies.Whilemembersofcluster2–4shouldbeaddressedbychangingthecontentofthemodulesvisualizedingrayordarkblue,usersintheﬁrstsegmentcouldalsobetriggeredbythemoduleencodedinblack.

Analogously,thebehaviorcouldbeprojectedonthecat-egoriestovisualizetherespectivedistributionofcategoriesforeachsegment.However,wechoosetoshowamoreinter-estingprojectionforlackofspace.Theincorporationofthetimestampsofthesessionsallowsustovisualizetheclus-tersintime.Asthefeaturerepresentationoftimestampsencompassesoneweek,Figure5showstheaveragecategorydistributionacrossthedaysoftheweekwherediﬀerentcol-orscorrespondtodiﬀerentcategories.3

Apparently,theclustersdonotonlydiﬀerintermsofthecategoriesbutalsospecializeoncertainperiodsintimebe-causethesegmentsareoptimizedusingallavailabledata,thatis,attributeandbehaviorencodingvariables.TheﬁrstclusterclearlyspecializesonSundaysandischaracterizedbyacleantopicdistribution.Thethreeotherclusteralsopos-sessdominantcategoriesbutfocusesmoreonworkingdaysthanonweekends.Cluster4containsthemostdiversesetofcategoriesandactslikeabasinforcategoriesthatarenotaseasytodiscriminate.Hereitbecomesobviousthatasolu-tionwithonlyfourclustersmaynotbeoptimalforthetaskathand.Whenweincreasethemaximalnumberclusters,thecategorydistributionofclustersbecomescleanerthatislesscategoriesarelikely.Additionally,clustersadaptbettertospecializedperiodssuchasworkingdaysorweekendsforlargerk.

Takingvarioussuchprojectionsintoaccountdescribessegmentsfromdiﬀerentanglesandhelpstoﬁndaconcisetargetingstrategy.Forinstance,knowingthearticlesthatarelikelytobereadinanongoingsessionhelpstoaddresstherespectiveuserinvariouswaysincludingdisplayingads.Incorporatingcontextinformationssuchastheclickbehav-iorofthesegments,ﬁnallyallowsfortailoringwebpagestoeachsegmentandtoincreasetheoveralluserexperience.

5.CONCLUSION

Westudieddiscriminativeclusteringforstructuredandcomplexresponsevariablesthatcanberepresentedasgen-erativemodels.Theproblemsettingmatchesmarketseg-mentationtaskswherepopulationsaretobesegmentedintodisjointgroups.Solvingmarketsegmentation-likeproblemsappropriatelynotonlyinvolvesaclusteringoftheindividu-alsbutalsolearningaclassiﬁerthatdiscriminateswellbe-

tweenthesegments,forinstancetoallowforclassifyingnewcustomerstooneofthegroups.Thetwocomponentsneedtobelearnedjointlyandhaveaccesstodiﬀerentpiecesofinformationabouttheindividuals:theclassiﬁerneedstogroupindividualsonthebasisofaprioriavailableinfor-mationswhiletheclusteringaimsatgroupingpeoplewithsimilar(future)needsorbehavior.

Wedevisedtwoalgorithmsbasedonalternatingoptimiza-tionandcollapsedinference,respectively.EmpiricalresultsshowedthatthecollapsedvariantisnotonlymoreeﬃcientbutalsopredictsaccuratelytheclickbehaviorofusersforYahoo!News.Thegenerativenatureoftheclusteringledtointerpretableclusters.Weshowedhowprojectionsoftheclusteringononlyafewvariablesallowedfortargetingthedetectedsegmentsindividuallyandcontributedtouserun-derstanding.

OurapproachisnotrestrictedtoYahoo!NewsandcangenerallybeappliedtoarbitrarymarketsegmentationtasksandotherWebsitestoimprovetheoveralluserexperience.Asourapproachisorthogonaltopersonalizedapproaches,futureworkwillstudytheintegrationofbothframeworks.

[10]R.Jacobs,M.Jordan,S.Nowlan,andG.Hinton.

Adaptivemixturesoflocalexperts.Neuralcomputation,3(1):79–87,1991.[11]S.C.Johnson.Hierarchicalclusteringschemes.

Psychometrika,2:241–254,1967.[12]M.JordanandR.Jacobs.Hierarchicalmixturesof

expertsandtheemalgorithm.Neuralcomputation,6(2):181–214,1994.[13]M.Y.Kiang,M.Y.Hu,andD.M.Fisher.An

extendedself-organizingmapnetworkformarketsegmentation–atelecommunicationexample.DecisionSupportSystems,42:36–47,2006.[14]S.P.Lloyd.Leastsquarequantizationinpcm.IEEE

TransactionsonInformationTheory,28(2):129–137,1982.[15]G.Mann,R.McDonald,M.Mohri,N.Silberman,and

D.Walker.Eﬃcientlarge-scaledistributedtrainingofconditionalmaximumentropymodels.AdvancesinNeuralInformationProcessingSystems,22:1231–1239,2009.[16]M.Namvar,M.Gholamian,andS.KhakAbi.Atwo

phaseclusteringmethodforintelligentcustomersegmentation.In2010InternationalConferenceonIntelligentSystems,ModellingandSimulation,pages215–219.IEEE,2010.[17]J.Sinkkonen,S.Kaski,andJ.Nikkil¨a.Discriminative

clustering:Optimalcontingencytablesbylearningmetrics.MachineLearning:ECML2002,pages109–137,2002.[18]K.Wagstaﬀ,C.Cardie,S.Rogers,andS.Schr¨odl.

Constrainedk-meansclusteringwithbackgroundknowledge.InProceedingsoftheInternationalConferenceonMachineLearning,2001.[19]M.WedelandW.Kamakura.Marketsegmentation:

conceptualandmethodologicalfoundations,volume8.Springer,2000.[20]L.Xu,J.Neufeld,B.Larson,andD.Schuurmans.

Maximummarginclustering.Advancesinneuralinformationprocessingsystems,17:1537–1544,2005.[21]J.Ye,Z.Zhao,andM.Wu.Discriminativek-means

forclustering.InAdvancesinNeuralInformationProcessingSystems,2007.[22]W.YuandG.Qiang.Customersegmentationofport

basedonthemulti-instancekernelk-aggregateclusteringalgorithm.InManagementScienceandEngineering,2007.ICMSE2007.InternationalConferenceon,pages210–215,2007.[23]B.Zhao,J.Kwok,andC.Zhang.Maximummargin

clusteringwithmultivariatelossfunction.InDataMining,2009.ICDM’09.NinthIEEEInternationalConferenceon,pages637–646.IEEE,2009.[24]B.Zhao,F.Wang,andC.Zhang.Eﬃcientmulticlass

maximummarginclustering.InProceedingsofthe25thinternationalconferenceonMachinelearning,pages1248–1255.ACM,2008.

Acknowledgements

PartofthisworkwassupportedbytheGermanScienceFoundationunderthereferencenumberGA1615/1-1.

References

[1]N.Bansal,A.Blum,andS.Chawla.Correlation

clustering.MachineLearning,56(1-3):89–113,2004.[2]S.BickelandT.Scheﬀer.Multi-viewclustering.In

ProceedingsoftheIEEEinternationalconferenceondatamining.Citeseer,2004.[3]U.Brefeld,P.Geibel,andF.Wysotzki.Support

vectormachineswithexampledependentcosts.InProceedingsoftheEuropeanConferenceonMachineLearning,2003.[4]R.D’Andrade.U-statistichierarchicalclustering.

Psychometrika,4:58–67,1978.[5]F.DelaTorreandT.Kanade.Discriminativecluster

analysis.InProceedingsofthe23rdinternational

conferenceonMachinelearning,pages241–248.ACM,2006.[6]A.Dempster,N.Laird,andD.Rubin.Maximum

likelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),pages1–38,1977.[7]P.D’UrsoandL.D.Giovanni.Temporal

self-organizingmapsfortelecommunicationsmarketsegmentation.Neurocomput.,71:2880–2892,2008.[8]R.Gomes,A.Krause,andP.Perona.Discriminative

clusteringbyregularizedinformationmaximization.InAdvancesinNeuralInformationProcessingSystems,2010.[9]J.Huang,G.Tzeng,andC.Ong.Marketing

segmentationusingsupportvectorclustering.Expertsystemswithapplications,32(2):313–317,2007.

425

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

Discriminative Clustering for Market Segmentation