PeterHaider
∗
Dep.ofComputerScienceUniv.ofPotsdam,Germany
haider@cs.uni-potsdam.de
chiarluc@yahoo-inc.com
WebResearchGroupUniversitatPompeuFabra
Barcelona,Spain
LucaChiarandini
†
brefeld@uni-bonn.de
UniversityofBonnBonn,Germany
UlfBrefeld
ABSTRACT
Westudydiscriminativeclusteringformarketsegmentationtasks.Theunderlyingproblemsettingresemblesdiscrimi-nativeclustering,however,existingapproachesfocusonthepredictionofunivariateclusterlabels.Bycontrast,marketsegmentsencodecomplex(future)behavioroftheindivid-ualswhichcannotberepresentedbyasinglevariable.Inthispaper,wegeneralizediscriminativeclusteringtostruc-turedandcomplexoutputvariablesthatcanberepresentedasgraphicalmodels.Wedevisetwonovelmethodstojointlylearntheclassifierandtheclusteringusingalternatingop-timizationandcollapsedinference,respectively.Thetwoapproachesjointlylearnadiscriminativesegmentationoftheinputspaceandagenerativeoutputpredictionmodelforeachsegment.WeevaluateourmethodsonsegmentingusernavigationsequencesfromYahoo!News.Theproposedcollapsedalgorithmisobservedtooutperformbaselineap-proachessuchasmixtureofexperts.Weshowcaseexem-plaryprojectionsoftheresultingsegmentstodisplaytheinterpretabilityofthesolutions.
CategoriesandSubjectDescriptors
I.5.3[Clustering]:Algorithms
GeneralTerms
Algorithms,Experimentation
Keywords
DiscriminativeClustering,MarketSegmentation
1.INTRODUCTION
Marketsegmentationrevealsdivisionsinagivenmarket,whereamarketreferstoapopulationofinterestsuchas
criminatedfromeachotherbytheclassifier.Combiningthetwocriteriathusguaranteesconciseclusteringsandaccu-rateclassifiers.Existingapproachesfocusonclusteringapopulationandpredictingaclusterlabelforanewinstance.Bycontrast,marketsegmentationismorecomplex.Inmar-ketsegmentationtasks,weneedtodifferentiatebetweenthedatathatcharacterizesindividualsandthedatathatcharac-terizestheirfuturebehavior.Theclusteringclearlyneedstotakeallavailableinformationintoaccounttogeneratemean-ingfulsegments.However,theclassifierdoesnothaveaccesstofutureeventsandneedstotakeadecisionontheavailableinformationsuchasgender,income,etc.Thisobservationrendersexistingapproachestodiscriminativeclusteringtoorestrictiveformarketsegmentationtasks.
Inthispaperwegeneralizediscriminativeclusteringformarketsegmentationtasksusingthestructuredpredictionframework.Wedifferentiatebetweenattributesofacus-tomerandherinterests/behavior.Attributesareaprioriavailablefeaturesofindividualsofthepopulationsuchasgenderorincome.Herbehaviorisacollectionofinteract-ingvariablesdescribingasegment.Assegmentsneedtobeinterpretable,wemodeltheoutputdataasacomplexandstructuredvariablewhichcanberepresentedasagraphicalmodel.Thedistinctionallowsforlearningaclassifieronlyontheattributes,computingtheclusteringonbothattributesandbehavior,andfinallysummarizingthesegmentsonlyintermsofthebehavior.
Wedevisetwosolutionswhicharebasedontheregular-izedempiricalriskminimizationframework.Thefirstisastraightforwardadaptationofmixturesofexperts.Classi-fierandclusteringareoptimizedusinganalternatingstrat-egywherewefixonecomponentwhileoptimizingtheother.Thesecondsolutionusesapproximationsandintegratesoutparametersoftheclassifierusingcollapsedinferenceforeffi-ciency.Bothapproachesusegenerativemodelsfortheout-putstructureand,incontrasttoconventionaldiscriminativeclusteringapproaches,donotinvolvetrade-offparametersforclassificationaccuracyandclusterconsistency(classbal-ance)becausetheoptimizationproblemsarenotpronetotrivialanddegeneratesolutions.
Usecasesofourmethodscontaintraditionalmarketseg-mentationtasks.Considerforinstanceacompanythataimsatpromotinganewproductorahotelchainthatintendstolurevisitorswithspecialoffers.Ourmethodsnotonlycom-puteameaningfulsegmentationofthecustomersbutalsoallowfordevisingappropriatetargetingstrategiesfromthegraphicalmodels.Moreover,ourmethodservesasdiscrim-inativeclusteringforstructuredvariables,wherethetaskisnottooutputasingleclass/clusterlabelbuttheaveragestructureforeverysegment.Thedifferentiationbetweenattributesandbehaviorincreasestherangeofapplicationsthatcanbeaddressed.Aspecial–butstillnovel–caseisobtainedwhenattributesandbehaviorpartiallyoverlap.Empirically,westudyourmethodsonanotherinterestingusecase:SegmentingusernavigationsessionsontheWebfordisplayingsegment-specificwebsitelayouts.Weexperi-mentonalargeclicklogfromYahoo!News.Theattributedataisassembledfrommeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andthefirstpagerequest.Thebehaviorconsistsofsubsequentnaviga-tionactionsgivenbyclicksequences.Thegenerativerep-resentationofthebehaviordataisinterpretableandcanbeeasilytransformedintosegment-specificlayouts.
Theremainderofthepaperisstructuredasfollows.Sec-tion2discussestherelationshipofourproblemsettingwithpreviouslystudiedsettingsandmethods.InSection3wederivetwoalgorithmstooptimizetheempiricalcounterpartoftheexpectedsegmentedlog-likelihood.Section4reportsonempiricalresultsusingalargeclicklogfromacommercialnewsproviderandSection5concludes.
2.RELATEDWORK
Marketsegmentationtasksareoftensolvedusingneuralnetworkssuchasself-organizingmaps(13;7).Kiangetal.(13)forinstanceextendself-organizingmapstogroupcus-tomersaccordingtotheirattitudetowardsdifferentcommu-nicationsmodes.D’UrsoanddeGiovanni(7)usethenatu-ralclusteringpropertyofself-organizingmapstogetherwithdissimilaritymeasureswhichcapturetemporalstructureofthedata.Ingeneral,clusteringdatawithself-organizingmapsandvariantsthereofinherentlyimplementsthehomo-geneityassumptionofmarketsegmentation.However,clas-sifyingnewinstancesintotheclusteringisoftendifficultanditisnotpossibletooutputgenerativemodelstosummarizetheresultingclusters.Additionally,theoptimizationcrite-rionofself-organizingmapsishighlysensitivetotheactualinitializationandusuallyconvergestodifferentlocaloptima.Relatedtomarketsegmentationisthetaskofestimatingamixturemodelforobservations(6).Introducingselectorvariablesencodingprobabilisticcluster-memberships,maxi-mizingthelog-likelihoodbymarginalizingovertheselectorisusuallystraightforward.Theselectorcanbemodeledinadata-dependentordata-independentfashionbuttheprob-abilisticnatureofthecluster-membershipsrenderadirectapplicationformarketsegmentationtasksimpossible.
Discriminativeclusteringsimultaneouslycomputesaseg-mentationofthedataathandandaclassifierthatdiscrim-inatestheresultingclusterswell.Existingapproachesin-cludeprojectionsintolower-dimensionalsubspaces(5),jointoptimizationofmax-marginclassifiersandclusterings(20;24),theoptimizationofscattermetrics(21),andthemax-imizationofaninformationtheoreticcriteriontobalanceclassseparationandclassifiercomplexity(8).Sinkkonenetal.(17)aimtofindclustersthatarehomogeneousinauxiliarydatagivenbyadditionaldiscretevariables.Theabovementionedapproachesdonotpredictanyoutputvari-ablebutfocusonthediscreteclustervariable.Moreover,inourgeneralizedproblemsetting,instancesarerepresentedasinput-outputpairs.Theclassifierdiscriminatestheclustersgivenonlytheinput,whereastheclusterparametersneedtoaccuratelyestimatetheoutputsofthecontainedinstances.Previousworkondiscriminativeclusteringdoesnotsplitin-stancesintotwoparts.Theyrepresentinstancesasasingleinputwhichconsequentlyallowstheclassifierstoaccessthewholeexampleatdecisiontime.Thesameassumptioniscommonlybeingmadeinmarketsegmentationstudiesthatinvolvemodel-basedclusteringapproaches,(9;16;22)butprohibitsanaturalsolutionformarketsegmentationtasks.ThisproblemsettingcanbeseenasanalterationofthesettingwhichtheMixtureofExpertsapproach(10;12)aimstosolve,wherethebehavioryispredictedgiventheat-tributesxasamixturemodelwherethemixturecomponentweightsdependagainonx.Inourcase,mixturecomponentweightshavetobealwayspointdistributionsasdemandedbytheapplication.Framingthedistributionofygiventhemixturecomponentasapuregenerativemodelallowsusto
418
deriveamoreefficientalgorithmthanthatoftheMixtureofExpertsapproach.
Zhaoetal.(23)proposedamaximum-margincluster-ingformultivariatelossfunctions.Minimizingthecomplexlossesallowsforcapturingstructuraldifferencesthatcannotbeexpressedintermsofstandardmisclassificationrates.Inprinciple,bydefiningalossfunctionthatcapturesthedifferencesoftwoclusterings,onecouldpossiblysolvemar-ketsegmentationtasksastheirapproachimplicitlyfavorsclusteringsthatareeasilydiscriminable.However,thelossfunctioncannotbeexpressedintermsofthecontingencyta-bleofthetwoclusterings,andthedecodingproblemintheinnerloopofthealgorithm,thatisfindingthemostviolatedconstraint,becomesintractableinpractice.
Alsorelatedtoourproblemsettingaremulti-viewclus-teringapproaches,wherethedataissplitintotwodisjointfeaturesets,whicharesometimesalsocalledviews.BickelandScheffer(2)presentanintertwinedExpectationMaxi-mizationalgorithmtocomputethemaximumlikelihoodso-lutionforoneviewusingtheexpectationsprovidedbyitspeerview.Thetwodataviewsaremodeledgenerativelyandthealgorithmmaximizesthejointmarginallikelihood.Bycontrast,weaimtofindadiscriminativeclassifierontheinputviewandinsteadofmaximizingthemarginallikeli-hoodoftheoutputviewweseektomaximizethelikelihoodconditionedonahardclusterassignment.
maximizingtheexpectedriskfunctionalRthatisdefinedintermsofthesegmentedlog-likelihood
R(h,θ)=logP(y|θh(x))dP(x,y).(1)SincethetruejointdistributionP(x,y)isunknown,were-placeEquation(1)byitsempiricalcounterpartonthefinite
marketsampleofsizengivenbyM={(xi,yi)}ni=1
ˆ(θ,h)=R
ni=1
logP(yi|θh(xi)).
(2)
DirectlymaximizingEquation(2)intermsofthecompo-nentparametersθandtheclassifierhisinfeasible,sincetheobjectivefunctionisnotonlyhighlynon-convexandnon-continuousbutanNP-hardproblembecauseofcombinato-rialassignments.However,iftheclassifierhwasfixed,theθjcouldbeoptimizeddirectly,ashprovidesthesegmentation
ˆjaretriviallyandforeachsegmentjoptimalparametersθ
computedby
ˆj=argmaxθlogP(yi|θ).(3)
θ
i:h(xi)=j
3.DISCRIMINATIVESEGMENTATION
Wenowpresentourmaincontribution,thegeneralization
ofdiscriminativeclusteringforstructuredoutputvariablestosolvemarketsegmentationproblems.Weintroducetheproblemsettinginthenextsectionandpresentastraightfor-wardsolutionintermsofmixturesofexpertsinSection3.2.AnefficientapproximationisdevisedinSection3.3andSec-tion3.3.3discussesscalabilityissues.
Formanycommondistributionfamilies,themaximumlike-lihoodestimatesP(y|θ)canbecomputedeasilybycountingoraveragingovertheobservationsinthesegment,respec-tively.Viceversa,keepingthesegmentparametersθ1,...,θkfixed,learningtheclassifierhresultsinastandardmulti-classclassificationscenario.Usinglinearmodels,hcanbewrittenas
⊤
h(x)=argmaxwjx,
j∈{1,...,k}
(4)
3.1Preliminaries
WearegivenasampleMfromamarketwhereindividuals
arerepresentedbytuples(x,y)∈X×Yencodingattributesxandbehaviory.Attributesxmayencompassindividualfeatureslikegender,income,etcwhiletheexpressedhistoricbehavioriscapturedbyy∈Yandrepresentedasagraph-icalmodel.ThebehaviorsyaregovernedbyafamilyofdistributionsdenotedbyP(y|θ)withnaturalparameterθ.InourrunningexampleonsegmentingusernavigationontheWeb,attributesxencodemeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andthefirstpagerequestandisrepresentedasafeaturevector.ThebehavioryencodessequencesofthesubsequentWebnavigationandcanforinstanceberepresentedasaMarkov-chainwherenodescorrespondtopageviewsandconnectingedgesvisualizeclicks.
Weaimtofindanappropriatesegmentationofthemar-ketM.Formally,thegoalistofindaclassifierh:X→{1,...,k}thatmapsattributesxtooneofkclusters,pa-rameterizedbyθ=(θ1,...,θk),wheretheθjarechosentomaximizethelikelihoodoftheobservedbehaviorsyoftherespectivesegment.Thenumberofclusterskisassumedtobegivenbytheapplicationathand,becauseitconsti-tutesatrade-offbetweenpredictivepowerandeffortspentfordevelopingmultiplemarketstrategies.Ashandθarenotindependenttheyneedtobeoptimizedjointly.Hence,overallclassifiershandparametercollectionsθ,weaimat
whereeachsegmenthasitsownweightvectorwj.Inthere-mainder,wewillusehandw=(w1,...,wk)⊤interchange-ably.Thenextsectionexploitsthisobservationandpresentsajointalternatingoptimizationscheme.
3.2AnAlternatingOptimizationScheme
Astraightforwardapproachtosolvemarketsegmentationproblemsistoalternatetheoptimizationoftheclassifierandtheclusteringwhilefixingtheother,respectively.AsshowninEquation(3),keepingtheclassifierfixedallowstoapplystandardmaximumlikelihoodtechniquestocomputethenaturalparametersofthesegments.Wethusfocusonderivingtheclassifierhforafixedclustering.Wemakeuseofthemaximum-marginframeworkanddeployare-scaledvariantofthehingelosstodealwithuncertaincluster-memberships(orclasslabels).
Theideaisasfollows.Intuitively,anindividual(x,y)shouldbeassignedtothesegmentthatrealizesthehigh-estlog-likelihoodwithrespecttoy.However,twoormoresegmentsmightbecompetingfortheinstanceandrealize
1:Input:(x1,y1),...,(xn,yn),λ>0,k>12:Initializeθrandomly3:repeat4:E-step:w←argminw′,ξ≥0λ5:
s.t.
′
(wj∗
i
i=1ξin
′⊤
−wj)x≥1−ξi
n
419
similarlog-likelihoods,inwhichcaseawinner-takes-allde-cisionisprohibitive.Wethustreatthedifferenceofthelog-likelihoodsbetweenthemostlikelysegmentj∗andclusterj′=j∗asamisclassificationscore,givenby
s(y)=logP(y|θj∗)−logP(y|θj′).
(5)
Thesescorescanbeincorporatedinasupportvectorma-chinebyre-scalingthehingelossandactlikeexample-de-pendentcosts(3).There-scaledhingelossbecomesaconvexupperboundofthedifferenceofthelog-likelihoods,
ℓ(x)=s(y)max0,1−(wj∗−wj′)⊤x=(ξ
.(6)
Stackingupw=(w1,...,wk)⊤andξ1,...,ξn)⊤,we
arriveatthefollowingmaximum-marginoptimizationprob-lem
wmin
λ
,ξ≥0
n
nξi
i=1
s.t.
(w∗−wj)⊤
x≥1ξi
ji
−whichstillcontainsthemutuallydependent
j′
exp(ρw⊤
j′xi)
,(7)
variablesθand
w.Toobtainanefficientlysolvableoptimizationproblem,weexpresstheobjectiveasacontinuousfunctionofwsothatwcanbeeliminatedusingcollapsedinference.InsteadofthehingelossinEquation(6),weemployanothertightconvexupperboundintermsofthesquaredloss,
ℓ(x)=(logP(y|θj)−w⊤
jx)2.
Implicitly,introducingthesquaredlossconvertstheclassi-fierintoaregressorthataimsatpredictingthelog-likelihood
foranindividual(x,y)andthej-thsegmentasaccurateaspossible.Assumingthelog-likelihoodswerepredictedperfectly,theparameterswwouldnotonlybeoptimalfortheregressionbutalsoforEquation(2)astheclassifierhinEquation(4)wouldstillreturnthemostlikelysegment.Changingthelossfunctionalsohastheadvantagethatnowtheoptimalsolutionforwcanbecomputedanalytically.Thecorrespondingregularizedoptimizationproblemisalsoknownasregularizedleastsquaresregression(RLSR)orridgeregressionandisgivenby
min
λ
w
longerdependonwandthathasonlytheθjasfreeparam-eters,
max
θ
ni=1
log
kj=1
P(yi|θj)
exp(ρπ(θj)⊤x¯i)
1:
2:3:4:5:6:7:8:9:10:
Input:(x1,y1),...,(xn,yn),λ
ρ←1,t←1,initializeθ(0)randomlyrepeat
E-step:Q(zi=j)←P(zi=j|xi,yi,ρ,λ,θ(t−1))
M-step:θ(t)←argmaxθijQ(zi=j)×
logP(yi,zi=j|xi,ρ,λ,θ)
ρ←ρ×1.1,t←t+1untilconvergenceˆ=argmaxRˆ(θ,α(θ))θθ
ˆ)α←α(θ
icalFollowingriskfunctionalthegeneralinEMEquationframework,
j′
exp(ρπ(θj)⊤x¯i)
.
(11)inwetermsexpressoftheempir-expec-tationszi,j.Thisallowsustoeffectivelypullthelogarithm
intothesumoversegmentsfortheM-step;wearriveattheoptimizationproblem
maxθ
zi,jlog
P(yi|θj)
exp(ρπ(θj)⊤x¯i)i
j
j
exp(ρπ(θoldj)⊤x¯i)
aretheTaylorcoeffi-
cientsandCisaconstant.SubstitutingEquation(12)intotheobjectivefunctionoftheM-stepandcollectingthecoef-ficientsgivesus
argmaxlogP(yi|θj)
zi,j+ρx¯i′izi′,j)−
θ
i
j
i′
zi′,j′ti′j′
j′
,(13)
algorithmeffectivelyintractable.Wecanalleviatethisbyrandomlypartitioningtheexamplesintheleast-squareses-timationinEquation(8)intosdisjointsubsetsS(1),...,S(s)ofsizem.Foreachsubsettheweightvectorsw(l)areesti-matedseparately,andthuswithineachsubsetthevectorsπ(θj)andthetransformedexamplesx¯haveonlymcompo-nents.Consequently,inEquation(13)theinnersummationovertheexamplesonlyrunsoverthemexamplesinthesubsettowhichexample(xi,yi)belongs.Finally,weobtaintheparametersoftheclassifierhbyaveragingtheweightvectorsoverallsubsets,wj=1
AtypicaltargetingstrategyinourYahoo!NewsexamplecouldforinstancebeadynamiclayoutoftheWebsitetoadvertisenewsarticlesofcategoriesthattherespectiveuserisprobablyinterestedin.
Fromadataperspective,modelingsequencesofclickedcategoriesbyMarkovprocessesisstraightforward.How-ever,Markovprocesses,e.g.,visualizedbytransitionmatri-ces,aredifficulttointerpretastheentriesencodeinterestswithrespecttothepreviouscategory.TakingtheinferredMarkovmodelproperlyintoaccountwouldimplychangingthewebsitelayoutwithinasessiondependingontheprevi-ouscategory.Asimplerwaytoobtaininterpretableclustersistousemultinomialdistributionsfortheoutputvariablesofinterest.Weusethesequencesofuserclicksenrichedwiththerespectivelocationsoftheclicks.Thatis,thebehavioryconsistsofthemulti-setofsubsequentlyclickedcategoriescandlinksectionss.ThedistributionP(y|θj)isdefinedastheproductofmultinomialdistributions
P(y|θ)=P(ci|µ)P(sj|ν),
l
j
whereµandνaretheparametervectorsgoverningthedis-tributionsovercategoriesandlinksections,respectively.
Theattributesxofasessionisrepresentedasabinaryfeaturevectorencodingthemostcommonreferrerdomains,therespectivecategoryofthefirstpageview,aswellasfea-turesencodingthetimestamp;weusebinaryindicatorsforeverydayoftheweek,foreachhouroftheday,andforeachhouroftheweek.Forthecollapsedalgorithm,weusealinearkernelandrandomlypartitionthetrainingdataintodisjointsubsetsofsize1,000forcomputingthepredictedlog-likelihoods.
4.1Baselines
Wecomparethecollapsedalgorithmwiththreebaselines,thealternatingoptimizationschemeinSection3.2,amix-tureofexpertsmodelandak-meansbasedsolution.Themixtureofexpertsmodel(12)minimizesthesquarederrorintermsofthewithin-clusterlog-likelihoodsandoptimizesthemarginallikelihood
logP(yi|θj)P(zi=j|x).
i
j
Allprocessingisanonymousandaggregated
Colorsoccurmorethanonceduetothelargenumberofcategories.
2
1
ThemixtureofexpertsmodelisoptimizedwithastandardEM-algorithmandthereforeprovidesonlyprobabilisticclus-terassignmentsanddoesnottakeintoaccountthatsessionsneedtobeassignedtoonlyasinglecluster.
Thethirdbaselineisderivedfromthestraightforward,yetsomewhatna¨ıve,approachtosegmenttheinputspacefirstandonlythenoptimizethegenerativemodelineachcluster.Thedrawbackofthisnon-iterativeapproachisthatitdoesgenerallynotleadtohomogeneousbehaviorwithinclustersbecausethesegmentsarefixedwhenestimatingthegenera-tivemodels.Weusek-meansforfindingtheclustering,andestimatesegmentparamtersθbymaximumlikelihoodbasedonthehardclusterassignments.Theclassifierhclassifiesanewinstanceintotheclusterwiththenearestcentroid.
InsteadofthepriorP(z=j)wehaveaconditionaldistribu-tionP(z=j|x)whichisdefinedinanalogytothecollapsedalgorithmas
P(z=j|x)∝exp(αj,ik(xi,x)).
i
422
Ineachsetting,everyalgorithmisdeployed10timeswithrandomparameterinitializationsandintheremainderweonlyreporttheresultsoftherunwithhighesttraininglike-lihood.
4.2Convergence
Inthissection,weevaluatetheconvergencebehaviorof
thecollapsedalgorithm.Recallthatthecollapsedalgorithmoptimizesanapproximateobjective,wherethehardclus-terassignmentsarereplacedbyasoft-maxcontrolledbyanincreasingfactorρ.Tocancelouteffectscausedbytheapproximation,wesubstitutetheresultingθintotheex-actoptimizationcriterioninEquation(2)andmeasuretherespectiveobjectivevalue.Notethattheresultsdonotnec-essarilyincreasemonotonically.
Figure3:Averagedpredictiveperformanceandstandarderror.
Figure2:Objectivevaluesforthecollapsedalgo-rithm(solid)andthemixtureofexpertsbaseline(dashed),fordifferentnumbersofclustersk.Figure2showstheresultsfordifferentnumbersofclus-tersforthecollapsedalgorithm(solidcurves).Forcompar-ison,wealsoaddedthemixtureofexpertsbaseline(dashedcurves).Asexpected,thetrueobjectivevalueisnotmono-tonic,sincebothalgorithmsoptimizeanapproximationtotheexactoptimizationcriterion.Thefigurealsoshowsthatthebestvaluesareobtainedafteratmost20iterations.
4.3PredictivePerformance
Toevaluatetheperformanceofthecollapsedalgorithm,wemeasureitspredictiveaccuracyintermsofhowwellfu-turebehaviorcanbepredicted.Theclassifierandtheseg-mentationarelearnedjointlyasdescribedinSection3us-ingthetrainingsetandthendeployedtothetestset.Thesessionsinthetestsetarefirstclassifiedbytheclassifierinoneofthesegmentswhichisthenusedtopredictthefutureclicksoftheuser.Sincethefinalpredictionisacom-plexvariable,werefrainfromexpressingtheperformanceintermsoferrorratesandmeasurethepredictivelog-likelihoodlogP(y|θh(x))instead.Wecomparethecollapsedalgorithmtothealternatingoptimizationscheme,themixtureofex-pertsmodel,andthek-meansbasedsolution.Wereportonaveragesandstandarderrorsover10repetitionswithdiffer-entrandominitializations.
Figure3showsthepredictiveperformanceforvaryingnumbersofclusters.Notsurprisingly,allmethodsperform
equallyworseforonlyasinglecluster.Foronlyafewclus-ters,themixtureofexpertsbaselineperformsaboutaswellasthecollapsedalgorithm.Wecreditthisfindingtotheexistenceofeasy-to-reachsolutionsthatdonotnecessarilyrequirehardclusterassignmentsintheθ-steps.However,whenthenumberofclustersgrows,theperformanceofthemixtureofexpertsapproachdecreasesslightlywhilethatofthecollapsedmodelincreases.Hereitbecomesmoreandmoreimportanttoselecttheparametersinawaythatal-lowstodiscriminatewellbetweentheclusters,andthusthecollapsedalgorithmoutperformsthebaselinessignificantly.Thealternatingalgorithmandthek-meansbaselineperformsignificantlyworsethanthecollapsedalgorithm.Onlyfor20andmoreclustersthealternatingalgorithmproducesbet-terresultsthanthemixtureofexpertsmodel.Notethatthek-meansperformsworstasitdoesnotuseanalternat-ingupdateschemabutfirstlearnstheclusteringandthenestimatesthegenerativemodelsusingthefixedsegments.Itisapparentthatthepredictiveperformancelevelsoffafterincreasingthenumberofclustersbeyond10.Intu-itively,thisobservationcanbeexplainedbyatrade-offbe-tweenclassificationandsegmentation:evenifamorefine-grainedclusteringwouldbeabletopredictthefuturebehav-iormoreaccurately,theclassifiercannotdiscriminatewellbetweenalargernumberofsimilarclusterstoidentifythebest-matchingsegment.Weobserveanaturaltrade-offbe-tweenpredictivepowerandtheeffortthathastobespentfordevelopingandmaintainingtargetstrategiesforalargenumberofmarketsegments.
Theexecutiontimeofthecollapsedalgorithmforasolu-tionwith10clustersiswithintherangeof3hours,comparedtoaboutanhoureachforthemixtureofexpertsandthek-meansbaselines.Thealternatingoptimizationhowevertakesabout11hourswhichrendersitsapplicationinfeasi-bleinpractice.
4.4Discussion
Marketsegmentationaimsatgroupingsimilarindividu-alsofapopulationtogetherthatsharethesameneedsorthathavesimilardemands.Thegoalistotargetindividu-alswithinthesamesegmentjointlye.g.,toadvertiseanewproduct.Tothisend,thesegmentsneedtobeinterpretable
423
Figure5:Clickvolumesofcategoriesovertimeforthefourclusters.
Cluster1Cluster2Cluster3Cluster4Figure4:Visualizationofclickfrequenciesforthe
fivemostfrequentlinklocationsusingfourclusters.
toderiveaconcisedescriptionofthesegmentsthatcanbeconvertedintoasegment-specifictargetingstrategy.
Inourcollapsedalgorithm,generativemodelsineachseg-mentencodethecontainedbehaviorandinterest.Theflex-ibilityoftheprobabilisticinferencemachineryallowsustoprojectthebehaviorontodiscriminativevariablestovisu-alizedifferentcharacteristicsoftheclusters.Inthissectionwegivetwoexamplesforsuchprojectionstovisualizedif-ferentlydistributeduserbehavioracrosstheclustering.Forsimplicity,weuseasolutionwithfourclusters.
Thefirstexampleshowsavisualizationofsegment-specificuserclicksintermsoftheirlocationontheWebpage.In-cludingthelocationofclicksisnecessaryforalteringthelay-outdynamicallyaschangesinfrequentlyclickedareaswillhaveimpactthebehaviormorethansubstitutingaredun-dantandlessclickedwidget.WefocusonthefivemodulesoftheWebsitethatreceivethehighestnumberofclicksinthedata.
Figure4showstheresults.Segments2,3,and4exhibitverysimilarclickbehaviorintermsoftheclickedmodules.Bycontrast,cluster1differssignificantlyintheusageoftheWebcomponents.Onaverage,usersincluster1preferthelocationvisualizedinblackoverthealternativescompared
tousersintheothersegments.Thisobservationcouldbeex-ploitedtodirectlydevisetargetstrategies.Whilemembersofcluster2–4shouldbeaddressedbychangingthecontentofthemodulesvisualizedingrayordarkblue,usersinthefirstsegmentcouldalsobetriggeredbythemoduleencodedinblack.
Analogously,thebehaviorcouldbeprojectedonthecat-egoriestovisualizetherespectivedistributionofcategoriesforeachsegment.However,wechoosetoshowamoreinter-estingprojectionforlackofspace.Theincorporationofthetimestampsofthesessionsallowsustovisualizetheclus-tersintime.Asthefeaturerepresentationoftimestampsencompassesoneweek,Figure5showstheaveragecategorydistributionacrossthedaysoftheweekwheredifferentcol-orscorrespondtodifferentcategories.3
Apparently,theclustersdonotonlydifferintermsofthecategoriesbutalsospecializeoncertainperiodsintimebe-causethesegmentsareoptimizedusingallavailabledata,thatis,attributeandbehaviorencodingvariables.ThefirstclusterclearlyspecializesonSundaysandischaracterizedbyacleantopicdistribution.Thethreeotherclusteralsopos-sessdominantcategoriesbutfocusesmoreonworkingdaysthanonweekends.Cluster4containsthemostdiversesetofcategoriesandactslikeabasinforcategoriesthatarenotaseasytodiscriminate.Hereitbecomesobviousthatasolu-tionwithonlyfourclustersmaynotbeoptimalforthetaskathand.Whenweincreasethemaximalnumberclusters,thecategorydistributionofclustersbecomescleanerthatislesscategoriesarelikely.Additionally,clustersadaptbettertospecializedperiodssuchasworkingdaysorweekendsforlargerk.
Takingvarioussuchprojectionsintoaccountdescribessegmentsfromdifferentanglesandhelpstofindaconcisetargetingstrategy.Forinstance,knowingthearticlesthatarelikelytobereadinanongoingsessionhelpstoaddresstherespectiveuserinvariouswaysincludingdisplayingads.Incorporatingcontextinformationssuchastheclickbehav-iorofthesegments,finallyallowsfortailoringwebpagestoeachsegmentandtoincreasetheoveralluserexperience.
5.CONCLUSION
Westudieddiscriminativeclusteringforstructuredandcomplexresponsevariablesthatcanberepresentedasgen-erativemodels.Theproblemsettingmatchesmarketseg-mentationtaskswherepopulationsaretobesegmentedintodisjointgroups.Solvingmarketsegmentation-likeproblemsappropriatelynotonlyinvolvesaclusteringoftheindividu-alsbutalsolearningaclassifierthatdiscriminateswellbe-
tweenthesegments,forinstancetoallowforclassifyingnewcustomerstooneofthegroups.Thetwocomponentsneedtobelearnedjointlyandhaveaccesstodifferentpiecesofinformationabouttheindividuals:theclassifierneedstogroupindividualsonthebasisofaprioriavailableinfor-mationswhiletheclusteringaimsatgroupingpeoplewithsimilar(future)needsorbehavior.
Wedevisedtwoalgorithmsbasedonalternatingoptimiza-tionandcollapsedinference,respectively.EmpiricalresultsshowedthatthecollapsedvariantisnotonlymoreefficientbutalsopredictsaccuratelytheclickbehaviorofusersforYahoo!News.Thegenerativenatureoftheclusteringledtointerpretableclusters.Weshowedhowprojectionsoftheclusteringononlyafewvariablesallowedfortargetingthedetectedsegmentsindividuallyandcontributedtouserun-derstanding.
OurapproachisnotrestrictedtoYahoo!NewsandcangenerallybeappliedtoarbitrarymarketsegmentationtasksandotherWebsitestoimprovetheoveralluserexperience.Asourapproachisorthogonaltopersonalizedapproaches,futureworkwillstudytheintegrationofbothframeworks.
[10]R.Jacobs,M.Jordan,S.Nowlan,andG.Hinton.
Adaptivemixturesoflocalexperts.Neuralcomputation,3(1):79–87,1991.[11]S.C.Johnson.Hierarchicalclusteringschemes.
Psychometrika,2:241–254,1967.[12]M.JordanandR.Jacobs.Hierarchicalmixturesof
expertsandtheemalgorithm.Neuralcomputation,6(2):181–214,1994.[13]M.Y.Kiang,M.Y.Hu,andD.M.Fisher.An
extendedself-organizingmapnetworkformarketsegmentation–atelecommunicationexample.DecisionSupportSystems,42:36–47,2006.[14]S.P.Lloyd.Leastsquarequantizationinpcm.IEEE
TransactionsonInformationTheory,28(2):129–137,1982.[15]G.Mann,R.McDonald,M.Mohri,N.Silberman,and
D.Walker.Efficientlarge-scaledistributedtrainingofconditionalmaximumentropymodels.AdvancesinNeuralInformationProcessingSystems,22:1231–1239,2009.[16]M.Namvar,M.Gholamian,andS.KhakAbi.Atwo
phaseclusteringmethodforintelligentcustomersegmentation.In2010InternationalConferenceonIntelligentSystems,ModellingandSimulation,pages215–219.IEEE,2010.[17]J.Sinkkonen,S.Kaski,andJ.Nikkil¨a.Discriminative
clustering:Optimalcontingencytablesbylearningmetrics.MachineLearning:ECML2002,pages109–137,2002.[18]K.Wagstaff,C.Cardie,S.Rogers,andS.Schr¨odl.
Constrainedk-meansclusteringwithbackgroundknowledge.InProceedingsoftheInternationalConferenceonMachineLearning,2001.[19]M.WedelandW.Kamakura.Marketsegmentation:
conceptualandmethodologicalfoundations,volume8.Springer,2000.[20]L.Xu,J.Neufeld,B.Larson,andD.Schuurmans.
Maximummarginclustering.Advancesinneuralinformationprocessingsystems,17:1537–1544,2005.[21]J.Ye,Z.Zhao,andM.Wu.Discriminativek-means
forclustering.InAdvancesinNeuralInformationProcessingSystems,2007.[22]W.YuandG.Qiang.Customersegmentationofport
basedonthemulti-instancekernelk-aggregateclusteringalgorithm.InManagementScienceandEngineering,2007.ICMSE2007.InternationalConferenceon,pages210–215,2007.[23]B.Zhao,J.Kwok,andC.Zhang.Maximummargin
clusteringwithmultivariatelossfunction.InDataMining,2009.ICDM’09.NinthIEEEInternationalConferenceon,pages637–646.IEEE,2009.[24]B.Zhao,F.Wang,andC.Zhang.Efficientmulticlass
maximummarginclustering.InProceedingsofthe25thinternationalconferenceonMachinelearning,pages1248–1255.ACM,2008.
Acknowledgements
PartofthisworkwassupportedbytheGermanScienceFoundationunderthereferencenumberGA1615/1-1.
References
[1]N.Bansal,A.Blum,andS.Chawla.Correlation
clustering.MachineLearning,56(1-3):89–113,2004.[2]S.BickelandT.Scheffer.Multi-viewclustering.In
ProceedingsoftheIEEEinternationalconferenceondatamining.Citeseer,2004.[3]U.Brefeld,P.Geibel,andF.Wysotzki.Support
vectormachineswithexampledependentcosts.InProceedingsoftheEuropeanConferenceonMachineLearning,2003.[4]R.D’Andrade.U-statistichierarchicalclustering.
Psychometrika,4:58–67,1978.[5]F.DelaTorreandT.Kanade.Discriminativecluster
analysis.InProceedingsofthe23rdinternational
conferenceonMachinelearning,pages241–248.ACM,2006.[6]A.Dempster,N.Laird,andD.Rubin.Maximum
likelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),pages1–38,1977.[7]P.D’UrsoandL.D.Giovanni.Temporal
self-organizingmapsfortelecommunicationsmarketsegmentation.Neurocomput.,71:2880–2892,2008.[8]R.Gomes,A.Krause,andP.Perona.Discriminative
clusteringbyregularizedinformationmaximization.InAdvancesinNeuralInformationProcessingSystems,2010.[9]J.Huang,G.Tzeng,andC.Ong.Marketing
segmentationusingsupportvectorclustering.Expertsystemswithapplications,32(2):313–317,2007.
425
因篇幅问题不能全部显示,请点此查看更多更全内容