您的当前位置:首页正文

Discriminative Clustering for Market Segmentation

2020-02-28 来源:好走旅游网
DiscriminativeClusteringforMarketSegmentation

PeterHaider

Dep.ofComputerScienceUniv.ofPotsdam,Germany

haider@cs.uni-potsdam.de

chiarluc@yahoo-inc.com

WebResearchGroupUniversitatPompeuFabra

Barcelona,Spain

LucaChiarandini

brefeld@uni-bonn.de

UniversityofBonnBonn,Germany

UlfBrefeld

ABSTRACT

Westudydiscriminativeclusteringformarketsegmentationtasks.Theunderlyingproblemsettingresemblesdiscrimi-nativeclustering,however,existingapproachesfocusonthepredictionofunivariateclusterlabels.Bycontrast,marketsegmentsencodecomplex(future)behavioroftheindivid-ualswhichcannotberepresentedbyasinglevariable.Inthispaper,wegeneralizediscriminativeclusteringtostruc-turedandcomplexoutputvariablesthatcanberepresentedasgraphicalmodels.Wedevisetwonovelmethodstojointlylearntheclassifierandtheclusteringusingalternatingop-timizationandcollapsedinference,respectively.Thetwoapproachesjointlylearnadiscriminativesegmentationoftheinputspaceandagenerativeoutputpredictionmodelforeachsegment.WeevaluateourmethodsonsegmentingusernavigationsequencesfromYahoo!News.Theproposedcollapsedalgorithmisobservedtooutperformbaselineap-proachessuchasmixtureofexperts.Weshowcaseexem-plaryprojectionsoftheresultingsegmentstodisplaytheinterpretabilityofthesolutions.

CategoriesandSubjectDescriptors

I.5.3[Clustering]:Algorithms

GeneralTerms

Algorithms,Experimentation

Keywords

DiscriminativeClustering,MarketSegmentation

1.INTRODUCTION

Marketsegmentationrevealsdivisionsinagivenmarket,whereamarketreferstoapopulationofinterestsuchas

criminatedfromeachotherbytheclassifier.Combiningthetwocriteriathusguaranteesconciseclusteringsandaccu-rateclassifiers.Existingapproachesfocusonclusteringapopulationandpredictingaclusterlabelforanewinstance.Bycontrast,marketsegmentationismorecomplex.Inmar-ketsegmentationtasks,weneedtodifferentiatebetweenthedatathatcharacterizesindividualsandthedatathatcharac-terizestheirfuturebehavior.Theclusteringclearlyneedstotakeallavailableinformationintoaccounttogeneratemean-ingfulsegments.However,theclassifierdoesnothaveaccesstofutureeventsandneedstotakeadecisionontheavailableinformationsuchasgender,income,etc.Thisobservationrendersexistingapproachestodiscriminativeclusteringtoorestrictiveformarketsegmentationtasks.

Inthispaperwegeneralizediscriminativeclusteringformarketsegmentationtasksusingthestructuredpredictionframework.Wedifferentiatebetweenattributesofacus-tomerandherinterests/behavior.Attributesareaprioriavailablefeaturesofindividualsofthepopulationsuchasgenderorincome.Herbehaviorisacollectionofinteract-ingvariablesdescribingasegment.Assegmentsneedtobeinterpretable,wemodeltheoutputdataasacomplexandstructuredvariablewhichcanberepresentedasagraphicalmodel.Thedistinctionallowsforlearningaclassifieronlyontheattributes,computingtheclusteringonbothattributesandbehavior,andfinallysummarizingthesegmentsonlyintermsofthebehavior.

Wedevisetwosolutionswhicharebasedontheregular-izedempiricalriskminimizationframework.Thefirstisastraightforwardadaptationofmixturesofexperts.Classi-fierandclusteringareoptimizedusinganalternatingstrat-egywherewefixonecomponentwhileoptimizingtheother.Thesecondsolutionusesapproximationsandintegratesoutparametersoftheclassifierusingcollapsedinferenceforeffi-ciency.Bothapproachesusegenerativemodelsfortheout-putstructureand,incontrasttoconventionaldiscriminativeclusteringapproaches,donotinvolvetrade-offparametersforclassificationaccuracyandclusterconsistency(classbal-ance)becausetheoptimizationproblemsarenotpronetotrivialanddegeneratesolutions.

Usecasesofourmethodscontaintraditionalmarketseg-mentationtasks.Considerforinstanceacompanythataimsatpromotinganewproductorahotelchainthatintendstolurevisitorswithspecialoffers.Ourmethodsnotonlycom-puteameaningfulsegmentationofthecustomersbutalsoallowfordevisingappropriatetargetingstrategiesfromthegraphicalmodels.Moreover,ourmethodservesasdiscrim-inativeclusteringforstructuredvariables,wherethetaskisnottooutputasingleclass/clusterlabelbuttheaveragestructureforeverysegment.Thedifferentiationbetweenattributesandbehaviorincreasestherangeofapplicationsthatcanbeaddressed.Aspecial–butstillnovel–caseisobtainedwhenattributesandbehaviorpartiallyoverlap.Empirically,westudyourmethodsonanotherinterestingusecase:SegmentingusernavigationsessionsontheWebfordisplayingsegment-specificwebsitelayouts.Weexperi-mentonalargeclicklogfromYahoo!News.Theattributedataisassembledfrommeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andthefirstpagerequest.Thebehaviorconsistsofsubsequentnaviga-tionactionsgivenbyclicksequences.Thegenerativerep-resentationofthebehaviordataisinterpretableandcanbeeasilytransformedintosegment-specificlayouts.

Theremainderofthepaperisstructuredasfollows.Sec-tion2discussestherelationshipofourproblemsettingwithpreviouslystudiedsettingsandmethods.InSection3wederivetwoalgorithmstooptimizetheempiricalcounterpartoftheexpectedsegmentedlog-likelihood.Section4reportsonempiricalresultsusingalargeclicklogfromacommercialnewsproviderandSection5concludes.

2.RELATEDWORK

Marketsegmentationtasksareoftensolvedusingneuralnetworkssuchasself-organizingmaps(13;7).Kiangetal.(13)forinstanceextendself-organizingmapstogroupcus-tomersaccordingtotheirattitudetowardsdifferentcommu-nicationsmodes.D’UrsoanddeGiovanni(7)usethenatu-ralclusteringpropertyofself-organizingmapstogetherwithdissimilaritymeasureswhichcapturetemporalstructureofthedata.Ingeneral,clusteringdatawithself-organizingmapsandvariantsthereofinherentlyimplementsthehomo-geneityassumptionofmarketsegmentation.However,clas-sifyingnewinstancesintotheclusteringisoftendifficultanditisnotpossibletooutputgenerativemodelstosummarizetheresultingclusters.Additionally,theoptimizationcrite-rionofself-organizingmapsishighlysensitivetotheactualinitializationandusuallyconvergestodifferentlocaloptima.Relatedtomarketsegmentationisthetaskofestimatingamixturemodelforobservations(6).Introducingselectorvariablesencodingprobabilisticcluster-memberships,maxi-mizingthelog-likelihoodbymarginalizingovertheselectorisusuallystraightforward.Theselectorcanbemodeledinadata-dependentordata-independentfashionbuttheprob-abilisticnatureofthecluster-membershipsrenderadirectapplicationformarketsegmentationtasksimpossible.

Discriminativeclusteringsimultaneouslycomputesaseg-mentationofthedataathandandaclassifierthatdiscrim-inatestheresultingclusterswell.Existingapproachesin-cludeprojectionsintolower-dimensionalsubspaces(5),jointoptimizationofmax-marginclassifiersandclusterings(20;24),theoptimizationofscattermetrics(21),andthemax-imizationofaninformationtheoreticcriteriontobalanceclassseparationandclassifiercomplexity(8).Sinkkonenetal.(17)aimtofindclustersthatarehomogeneousinauxiliarydatagivenbyadditionaldiscretevariables.Theabovementionedapproachesdonotpredictanyoutputvari-ablebutfocusonthediscreteclustervariable.Moreover,inourgeneralizedproblemsetting,instancesarerepresentedasinput-outputpairs.Theclassifierdiscriminatestheclustersgivenonlytheinput,whereastheclusterparametersneedtoaccuratelyestimatetheoutputsofthecontainedinstances.Previousworkondiscriminativeclusteringdoesnotsplitin-stancesintotwoparts.Theyrepresentinstancesasasingleinputwhichconsequentlyallowstheclassifierstoaccessthewholeexampleatdecisiontime.Thesameassumptioniscommonlybeingmadeinmarketsegmentationstudiesthatinvolvemodel-basedclusteringapproaches,(9;16;22)butprohibitsanaturalsolutionformarketsegmentationtasks.ThisproblemsettingcanbeseenasanalterationofthesettingwhichtheMixtureofExpertsapproach(10;12)aimstosolve,wherethebehavioryispredictedgiventheat-tributesxasamixturemodelwherethemixturecomponentweightsdependagainonx.Inourcase,mixturecomponentweightshavetobealwayspointdistributionsasdemandedbytheapplication.Framingthedistributionofygiventhemixturecomponentasapuregenerativemodelallowsusto

418

deriveamoreefficientalgorithmthanthatoftheMixtureofExpertsapproach.

Zhaoetal.(23)proposedamaximum-margincluster-ingformultivariatelossfunctions.Minimizingthecomplexlossesallowsforcapturingstructuraldifferencesthatcannotbeexpressedintermsofstandardmisclassificationrates.Inprinciple,bydefiningalossfunctionthatcapturesthedifferencesoftwoclusterings,onecouldpossiblysolvemar-ketsegmentationtasksastheirapproachimplicitlyfavorsclusteringsthatareeasilydiscriminable.However,thelossfunctioncannotbeexpressedintermsofthecontingencyta-bleofthetwoclusterings,andthedecodingproblemintheinnerloopofthealgorithm,thatisfindingthemostviolatedconstraint,becomesintractableinpractice.

Alsorelatedtoourproblemsettingaremulti-viewclus-teringapproaches,wherethedataissplitintotwodisjointfeaturesets,whicharesometimesalsocalledviews.BickelandScheffer(2)presentanintertwinedExpectationMaxi-mizationalgorithmtocomputethemaximumlikelihoodso-lutionforoneviewusingtheexpectationsprovidedbyitspeerview.Thetwodataviewsaremodeledgenerativelyandthealgorithmmaximizesthejointmarginallikelihood.Bycontrast,weaimtofindadiscriminativeclassifierontheinputviewandinsteadofmaximizingthemarginallikeli-hoodoftheoutputviewweseektomaximizethelikelihoodconditionedonahardclusterassignment.

maximizingtheexpectedriskfunctionalRthatisdefinedintermsofthesegmentedlog-likelihood

󰀈

R(h,θ)=logP(y|θh(x))dP(x,y).(1)SincethetruejointdistributionP(x,y)isunknown,were-placeEquation(1)byitsempiricalcounterpartonthefinite

marketsampleofsizengivenbyM={(xi,yi)}ni=1

ˆ(θ,h)=R

n󰀆i=1

logP(yi|θh(xi)).

(2)

DirectlymaximizingEquation(2)intermsofthecompo-nentparametersθandtheclassifierhisinfeasible,sincetheobjectivefunctionisnotonlyhighlynon-convexandnon-continuousbutanNP-hardproblembecauseofcombinato-rialassignments.However,iftheclassifierhwasfixed,theθjcouldbeoptimizeddirectly,ashprovidesthesegmentation

ˆjaretriviallyandforeachsegmentjoptimalparametersθ

computedby

󰀆

ˆj=argmaxθlogP(yi|θ).(3)

θ

i:h(xi)=j

3.DISCRIMINATIVESEGMENTATION

Wenowpresentourmaincontribution,thegeneralization

ofdiscriminativeclusteringforstructuredoutputvariablestosolvemarketsegmentationproblems.Weintroducetheproblemsettinginthenextsectionandpresentastraightfor-wardsolutionintermsofmixturesofexpertsinSection3.2.AnefficientapproximationisdevisedinSection3.3andSec-tion3.3.3discussesscalabilityissues.

Formanycommondistributionfamilies,themaximumlike-lihoodestimatesP(y|θ)canbecomputedeasilybycountingoraveragingovertheobservationsinthesegment,respec-tively.Viceversa,keepingthesegmentparametersθ1,...,θkfixed,learningtheclassifierhresultsinastandardmulti-classclassificationscenario.Usinglinearmodels,hcanbewrittenas

h(x)=argmaxwjx,

j∈{1,...,k}

(4)

3.1Preliminaries

WearegivenasampleMfromamarketwhereindividuals

arerepresentedbytuples(x,y)∈X×Yencodingattributesxandbehaviory.Attributesxmayencompassindividualfeatureslikegender,income,etcwhiletheexpressedhistoricbehavioriscapturedbyy∈Yandrepresentedasagraph-icalmodel.ThebehaviorsyaregovernedbyafamilyofdistributionsdenotedbyP(y|θ)withnaturalparameterθ.InourrunningexampleonsegmentingusernavigationontheWeb,attributesxencodemeta-informationaboutthesessionsuchasthetimestamp,thereferrerdomain,andthefirstpagerequestandisrepresentedasafeaturevector.ThebehavioryencodessequencesofthesubsequentWebnavigationandcanforinstanceberepresentedasaMarkov-chainwherenodescorrespondtopageviewsandconnectingedgesvisualizeclicks.

Weaimtofindanappropriatesegmentationofthemar-ketM.Formally,thegoalistofindaclassifierh:X→{1,...,k}thatmapsattributesxtooneofkclusters,pa-rameterizedbyθ=(θ1,...,θk),wheretheθjarechosentomaximizethelikelihoodoftheobservedbehaviorsyoftherespectivesegment.Thenumberofclusterskisassumedtobegivenbytheapplicationathand,becauseitconsti-tutesatrade-offbetweenpredictivepowerandeffortspentfordevelopingmultiplemarketstrategies.Ashandθarenotindependenttheyneedtobeoptimizedjointly.Hence,overallclassifiershandparametercollectionsθ,weaimat

whereeachsegmenthasitsownweightvectorwj.Inthere-mainder,wewillusehandw=(w1,...,wk)⊤interchange-ably.Thenextsectionexploitsthisobservationandpresentsajointalternatingoptimizationscheme.

3.2AnAlternatingOptimizationScheme

Astraightforwardapproachtosolvemarketsegmentationproblemsistoalternatetheoptimizationoftheclassifierandtheclusteringwhilefixingtheother,respectively.AsshowninEquation(3),keepingtheclassifierfixedallowstoapplystandardmaximumlikelihoodtechniquestocomputethenaturalparametersofthesegments.Wethusfocusonderivingtheclassifierhforafixedclustering.Wemakeuseofthemaximum-marginframeworkanddeployare-scaledvariantofthehingelosstodealwithuncertaincluster-memberships(orclasslabels).

Theideaisasfollows.Intuitively,anindividual(x,y)shouldbeassignedtothesegmentthatrealizesthehigh-estlog-likelihoodwithrespecttoy.However,twoormoresegmentsmightbecompetingfortheinstanceandrealize

1:Input:(x1,y1),...,(xn,yn),λ>0,k>12:Initializeθrandomly3:repeat4:E-step:w←argminw′,ξ≥0λ5:

s.t.

(wj∗

i

i=1ξin

′⊤

−wj)x≥1−ξi

󰀁n

419

similarlog-likelihoods,inwhichcaseawinner-takes-allde-cisionisprohibitive.Wethustreatthedifferenceofthelog-likelihoodsbetweenthemostlikelysegmentj∗andclusterj′=j∗asamisclassificationscore,givenby

s(y)=logP(y|θj∗)−logP(y|θj′).

(5)

Thesescorescanbeincorporatedinasupportvectorma-chinebyre-scalingthehingelossandactlikeexample-de-pendentcosts(3).There-scaledhingelossbecomesaconvexupperboundofthedifferenceofthelog-likelihoods,

ℓ(x)=s(y)max󰀂0,1−(wj∗−wj′)⊤x=(ξ󰀃

.(6)

Stackingupw=(w1,...,wk)⊤andξ1,...,ξn)⊤,we

arriveatthefollowingmaximum-marginoptimizationprob-lem

wmin

λ

,ξ≥0

n

󰀆nξi

i=1

s.t.

(w∗−wj)⊤

x≥1ξi

ji

−whichstillcontainsthemutuallydependent󰀁

j′

exp(ρw⊤

j′xi)

,(7)

variablesθand

w.Toobtainanefficientlysolvableoptimizationproblem,weexpresstheobjectiveasacontinuousfunctionofwsothatwcanbeeliminatedusingcollapsedinference.InsteadofthehingelossinEquation(6),weemployanothertightconvexupperboundintermsofthesquaredloss,

ℓ(x)=(logP(y|θj)−w⊤

jx)2.

Implicitly,introducingthesquaredlossconvertstheclassi-fierintoaregressorthataimsatpredictingthelog-likelihood

foranindividual(x,y)andthej-thsegmentasaccurateaspossible.Assumingthelog-likelihoodswerepredictedperfectly,theparameterswwouldnotonlybeoptimalfortheregressionbutalsoforEquation(2)astheclassifierhinEquation(4)wouldstillreturnthemostlikelysegment.Changingthelossfunctionalsohastheadvantagethatnowtheoptimalsolutionforwcanbecomputedanalytically.Thecorrespondingregularizedoptimizationproblemisalsoknownasregularizedleastsquaresregression(RLSR)orridgeregressionandisgivenby

min

λ

w

longerdependonwandthathasonlytheθjasfreeparam-eters,

max

θ

n󰀆i=1

log

k󰀆j=1

P(yi|θj)

exp(ρπ(θj)⊤x¯i)

1:

2:3:4:5:6:7:8:9:10:

Input:(x1,y1),...,(xn,yn),λ

ρ←1,t←1,initializeθ(0)randomlyrepeat

E-step:Q(zi=j)←P(zi=j|xi,yi,ρ,λ,θ(t−1))

󰀁󰀁

M-step:θ(t)←argmaxθijQ(zi=j)×

logP(yi,zi=j|xi,ρ,λ,θ)

ρ←ρ×1.1,t←t+1untilconvergenceˆ=argmaxRˆ(θ,α(θ))θθ

ˆ)α←α(θ

icalFollowingriskfunctionalthegeneralinEMEquationframework,󰀁

j′

exp(ρπ(θj)⊤x¯i)

.

(11)inwetermsexpressoftheempir-expec-tationszi,j.Thisallowsustoeffectivelypullthelogarithm

intothesumoversegmentsfortheM-step;wearriveattheoptimizationproblem

maxθ

󰀆󰀆zi,jlog󰀄

P(yi|θj)

exp(ρπ(θj)⊤x¯i)i

j

󰀁

j

exp(ρπ(θoldj)⊤x¯i)

aretheTaylorcoeffi-

cientsandCisaconstant.SubstitutingEquation(12)intotheobjectivefunctionoftheM-stepandcollectingthecoef-ficientsgivesus

argmax󰀆󰀆logP(yi|θj)󰀄

zi,j+ρ󰀆x¯i′i󰀂zi′,j)−

θ

i

j

i′

󰀆

zi′,j′ti′j′

j′

󰀃

󰀅

,(13)

algorithmeffectivelyintractable.Wecanalleviatethisbyrandomlypartitioningtheexamplesintheleast-squareses-timationinEquation(8)intosdisjointsubsetsS(1),...,S(s)ofsizem.Foreachsubsettheweightvectorsw(l)areesti-matedseparately,andthuswithineachsubsetthevectorsπ(θj)andthetransformedexamplesx¯haveonlymcompo-nents.Consequently,inEquation(13)theinnersummationovertheexamplesonlyrunsoverthemexamplesinthesubsettowhichexample(xi,yi)belongs.Finally,weobtaintheparametersoftheclassifierhbyaveragingtheweightvectorsoverallsubsets,wj=1

AtypicaltargetingstrategyinourYahoo!NewsexamplecouldforinstancebeadynamiclayoutoftheWebsitetoadvertisenewsarticlesofcategoriesthattherespectiveuserisprobablyinterestedin.

Fromadataperspective,modelingsequencesofclickedcategoriesbyMarkovprocessesisstraightforward.How-ever,Markovprocesses,e.g.,visualizedbytransitionmatri-ces,aredifficulttointerpretastheentriesencodeinterestswithrespecttothepreviouscategory.TakingtheinferredMarkovmodelproperlyintoaccountwouldimplychangingthewebsitelayoutwithinasessiondependingontheprevi-ouscategory.Asimplerwaytoobtaininterpretableclustersistousemultinomialdistributionsfortheoutputvariablesofinterest.Weusethesequencesofuserclicksenrichedwiththerespectivelocationsoftheclicks.Thatis,thebehavioryconsistsofthemulti-setofsubsequentlyclickedcategoriescandlinksectionss.ThedistributionP(y|θj)isdefinedastheproductofmultinomialdistributions

󰀇󰀇

P(y|θ)=P(ci|µ)P(sj|ν),

l

j

whereµandνaretheparametervectorsgoverningthedis-tributionsovercategoriesandlinksections,respectively.

Theattributesxofasessionisrepresentedasabinaryfeaturevectorencodingthemostcommonreferrerdomains,therespectivecategoryofthefirstpageview,aswellasfea-turesencodingthetimestamp;weusebinaryindicatorsforeverydayoftheweek,foreachhouroftheday,andforeachhouroftheweek.Forthecollapsedalgorithm,weusealinearkernelandrandomlypartitionthetrainingdataintodisjointsubsetsofsize1,000forcomputingthepredictedlog-likelihoods.

4.1Baselines

Wecomparethecollapsedalgorithmwiththreebaselines,thealternatingoptimizationschemeinSection3.2,amix-tureofexpertsmodelandak-meansbasedsolution.Themixtureofexpertsmodel(12)minimizesthesquarederrorintermsofthewithin-clusterlog-likelihoodsandoptimizesthemarginallikelihood

󰀆󰀆

logP(yi|θj)P(zi=j|x).

i

j

Allprocessingisanonymousandaggregated

Colorsoccurmorethanonceduetothelargenumberofcategories.

2

1

ThemixtureofexpertsmodelisoptimizedwithastandardEM-algorithmandthereforeprovidesonlyprobabilisticclus-terassignmentsanddoesnottakeintoaccountthatsessionsneedtobeassignedtoonlyasinglecluster.

Thethirdbaselineisderivedfromthestraightforward,yetsomewhatna¨ıve,approachtosegmenttheinputspacefirstandonlythenoptimizethegenerativemodelineachcluster.Thedrawbackofthisnon-iterativeapproachisthatitdoesgenerallynotleadtohomogeneousbehaviorwithinclustersbecausethesegmentsarefixedwhenestimatingthegenera-tivemodels.Weusek-meansforfindingtheclustering,andestimatesegmentparamtersθbymaximumlikelihoodbasedonthehardclusterassignments.Theclassifierhclassifiesanewinstanceintotheclusterwiththenearestcentroid.

InsteadofthepriorP(z=j)wehaveaconditionaldistribu-tionP(z=j|x)whichisdefinedinanalogytothecollapsedalgorithmas

󰀆

P(z=j|x)∝exp(αj,ik(xi,x)).

i

422

Ineachsetting,everyalgorithmisdeployed10timeswithrandomparameterinitializationsandintheremainderweonlyreporttheresultsoftherunwithhighesttraininglike-lihood.

4.2Convergence

Inthissection,weevaluatetheconvergencebehaviorof

thecollapsedalgorithm.Recallthatthecollapsedalgorithmoptimizesanapproximateobjective,wherethehardclus-terassignmentsarereplacedbyasoft-maxcontrolledbyanincreasingfactorρ.Tocancelouteffectscausedbytheapproximation,wesubstitutetheresultingθintotheex-actoptimizationcriterioninEquation(2)andmeasuretherespectiveobjectivevalue.Notethattheresultsdonotnec-essarilyincreasemonotonically.

Figure3:Averagedpredictiveperformanceandstandarderror.

Figure2:Objectivevaluesforthecollapsedalgo-rithm(solid)andthemixtureofexpertsbaseline(dashed),fordifferentnumbersofclustersk.Figure2showstheresultsfordifferentnumbersofclus-tersforthecollapsedalgorithm(solidcurves).Forcompar-ison,wealsoaddedthemixtureofexpertsbaseline(dashedcurves).Asexpected,thetrueobjectivevalueisnotmono-tonic,sincebothalgorithmsoptimizeanapproximationtotheexactoptimizationcriterion.Thefigurealsoshowsthatthebestvaluesareobtainedafteratmost20iterations.

4.3PredictivePerformance

Toevaluatetheperformanceofthecollapsedalgorithm,wemeasureitspredictiveaccuracyintermsofhowwellfu-turebehaviorcanbepredicted.Theclassifierandtheseg-mentationarelearnedjointlyasdescribedinSection3us-ingthetrainingsetandthendeployedtothetestset.Thesessionsinthetestsetarefirstclassifiedbytheclassifierinoneofthesegmentswhichisthenusedtopredictthefutureclicksoftheuser.Sincethefinalpredictionisacom-plexvariable,werefrainfromexpressingtheperformanceintermsoferrorratesandmeasurethepredictivelog-likelihoodlogP(y|θh(x))instead.Wecomparethecollapsedalgorithmtothealternatingoptimizationscheme,themixtureofex-pertsmodel,andthek-meansbasedsolution.Wereportonaveragesandstandarderrorsover10repetitionswithdiffer-entrandominitializations.

Figure3showsthepredictiveperformanceforvaryingnumbersofclusters.Notsurprisingly,allmethodsperform

equallyworseforonlyasinglecluster.Foronlyafewclus-ters,themixtureofexpertsbaselineperformsaboutaswellasthecollapsedalgorithm.Wecreditthisfindingtotheexistenceofeasy-to-reachsolutionsthatdonotnecessarilyrequirehardclusterassignmentsintheθ-steps.However,whenthenumberofclustersgrows,theperformanceofthemixtureofexpertsapproachdecreasesslightlywhilethatofthecollapsedmodelincreases.Hereitbecomesmoreandmoreimportanttoselecttheparametersinawaythatal-lowstodiscriminatewellbetweentheclusters,andthusthecollapsedalgorithmoutperformsthebaselinessignificantly.Thealternatingalgorithmandthek-meansbaselineperformsignificantlyworsethanthecollapsedalgorithm.Onlyfor20andmoreclustersthealternatingalgorithmproducesbet-terresultsthanthemixtureofexpertsmodel.Notethatthek-meansperformsworstasitdoesnotuseanalternat-ingupdateschemabutfirstlearnstheclusteringandthenestimatesthegenerativemodelsusingthefixedsegments.Itisapparentthatthepredictiveperformancelevelsoffafterincreasingthenumberofclustersbeyond10.Intu-itively,thisobservationcanbeexplainedbyatrade-offbe-tweenclassificationandsegmentation:evenifamorefine-grainedclusteringwouldbeabletopredictthefuturebehav-iormoreaccurately,theclassifiercannotdiscriminatewellbetweenalargernumberofsimilarclusterstoidentifythebest-matchingsegment.Weobserveanaturaltrade-offbe-tweenpredictivepowerandtheeffortthathastobespentfordevelopingandmaintainingtargetstrategiesforalargenumberofmarketsegments.

Theexecutiontimeofthecollapsedalgorithmforasolu-tionwith10clustersiswithintherangeof3hours,comparedtoaboutanhoureachforthemixtureofexpertsandthek-meansbaselines.Thealternatingoptimizationhowevertakesabout11hourswhichrendersitsapplicationinfeasi-bleinpractice.

4.4Discussion

Marketsegmentationaimsatgroupingsimilarindividu-alsofapopulationtogetherthatsharethesameneedsorthathavesimilardemands.Thegoalistotargetindividu-alswithinthesamesegmentjointlye.g.,toadvertiseanewproduct.Tothisend,thesegmentsneedtobeinterpretable

423

Figure5:Clickvolumesofcategoriesovertimeforthefourclusters.

Cluster1Cluster2Cluster3Cluster4Figure4:Visualizationofclickfrequenciesforthe

fivemostfrequentlinklocationsusingfourclusters.

toderiveaconcisedescriptionofthesegmentsthatcanbeconvertedintoasegment-specifictargetingstrategy.

Inourcollapsedalgorithm,generativemodelsineachseg-mentencodethecontainedbehaviorandinterest.Theflex-ibilityoftheprobabilisticinferencemachineryallowsustoprojectthebehaviorontodiscriminativevariablestovisu-alizedifferentcharacteristicsoftheclusters.Inthissectionwegivetwoexamplesforsuchprojectionstovisualizedif-ferentlydistributeduserbehavioracrosstheclustering.Forsimplicity,weuseasolutionwithfourclusters.

Thefirstexampleshowsavisualizationofsegment-specificuserclicksintermsoftheirlocationontheWebpage.In-cludingthelocationofclicksisnecessaryforalteringthelay-outdynamicallyaschangesinfrequentlyclickedareaswillhaveimpactthebehaviormorethansubstitutingaredun-dantandlessclickedwidget.WefocusonthefivemodulesoftheWebsitethatreceivethehighestnumberofclicksinthedata.

Figure4showstheresults.Segments2,3,and4exhibitverysimilarclickbehaviorintermsoftheclickedmodules.Bycontrast,cluster1differssignificantlyintheusageoftheWebcomponents.Onaverage,usersincluster1preferthelocationvisualizedinblackoverthealternativescompared

tousersintheothersegments.Thisobservationcouldbeex-ploitedtodirectlydevisetargetstrategies.Whilemembersofcluster2–4shouldbeaddressedbychangingthecontentofthemodulesvisualizedingrayordarkblue,usersinthefirstsegmentcouldalsobetriggeredbythemoduleencodedinblack.

Analogously,thebehaviorcouldbeprojectedonthecat-egoriestovisualizetherespectivedistributionofcategoriesforeachsegment.However,wechoosetoshowamoreinter-estingprojectionforlackofspace.Theincorporationofthetimestampsofthesessionsallowsustovisualizetheclus-tersintime.Asthefeaturerepresentationoftimestampsencompassesoneweek,Figure5showstheaveragecategorydistributionacrossthedaysoftheweekwheredifferentcol-orscorrespondtodifferentcategories.3

Apparently,theclustersdonotonlydifferintermsofthecategoriesbutalsospecializeoncertainperiodsintimebe-causethesegmentsareoptimizedusingallavailabledata,thatis,attributeandbehaviorencodingvariables.ThefirstclusterclearlyspecializesonSundaysandischaracterizedbyacleantopicdistribution.Thethreeotherclusteralsopos-sessdominantcategoriesbutfocusesmoreonworkingdaysthanonweekends.Cluster4containsthemostdiversesetofcategoriesandactslikeabasinforcategoriesthatarenotaseasytodiscriminate.Hereitbecomesobviousthatasolu-tionwithonlyfourclustersmaynotbeoptimalforthetaskathand.Whenweincreasethemaximalnumberclusters,thecategorydistributionofclustersbecomescleanerthatislesscategoriesarelikely.Additionally,clustersadaptbettertospecializedperiodssuchasworkingdaysorweekendsforlargerk.

Takingvarioussuchprojectionsintoaccountdescribessegmentsfromdifferentanglesandhelpstofindaconcisetargetingstrategy.Forinstance,knowingthearticlesthatarelikelytobereadinanongoingsessionhelpstoaddresstherespectiveuserinvariouswaysincludingdisplayingads.Incorporatingcontextinformationssuchastheclickbehav-iorofthesegments,finallyallowsfortailoringwebpagestoeachsegmentandtoincreasetheoveralluserexperience.

5.CONCLUSION

Westudieddiscriminativeclusteringforstructuredandcomplexresponsevariablesthatcanberepresentedasgen-erativemodels.Theproblemsettingmatchesmarketseg-mentationtaskswherepopulationsaretobesegmentedintodisjointgroups.Solvingmarketsegmentation-likeproblemsappropriatelynotonlyinvolvesaclusteringoftheindividu-alsbutalsolearningaclassifierthatdiscriminateswellbe-

tweenthesegments,forinstancetoallowforclassifyingnewcustomerstooneofthegroups.Thetwocomponentsneedtobelearnedjointlyandhaveaccesstodifferentpiecesofinformationabouttheindividuals:theclassifierneedstogroupindividualsonthebasisofaprioriavailableinfor-mationswhiletheclusteringaimsatgroupingpeoplewithsimilar(future)needsorbehavior.

Wedevisedtwoalgorithmsbasedonalternatingoptimiza-tionandcollapsedinference,respectively.EmpiricalresultsshowedthatthecollapsedvariantisnotonlymoreefficientbutalsopredictsaccuratelytheclickbehaviorofusersforYahoo!News.Thegenerativenatureoftheclusteringledtointerpretableclusters.Weshowedhowprojectionsoftheclusteringononlyafewvariablesallowedfortargetingthedetectedsegmentsindividuallyandcontributedtouserun-derstanding.

OurapproachisnotrestrictedtoYahoo!NewsandcangenerallybeappliedtoarbitrarymarketsegmentationtasksandotherWebsitestoimprovetheoveralluserexperience.Asourapproachisorthogonaltopersonalizedapproaches,futureworkwillstudytheintegrationofbothframeworks.

[10]R.Jacobs,M.Jordan,S.Nowlan,andG.Hinton.

Adaptivemixturesoflocalexperts.Neuralcomputation,3(1):79–87,1991.[11]S.C.Johnson.Hierarchicalclusteringschemes.

Psychometrika,2:241–254,1967.[12]M.JordanandR.Jacobs.Hierarchicalmixturesof

expertsandtheemalgorithm.Neuralcomputation,6(2):181–214,1994.[13]M.Y.Kiang,M.Y.Hu,andD.M.Fisher.An

extendedself-organizingmapnetworkformarketsegmentation–atelecommunicationexample.DecisionSupportSystems,42:36–47,2006.[14]S.P.Lloyd.Leastsquarequantizationinpcm.IEEE

TransactionsonInformationTheory,28(2):129–137,1982.[15]G.Mann,R.McDonald,M.Mohri,N.Silberman,and

D.Walker.Efficientlarge-scaledistributedtrainingofconditionalmaximumentropymodels.AdvancesinNeuralInformationProcessingSystems,22:1231–1239,2009.[16]M.Namvar,M.Gholamian,andS.KhakAbi.Atwo

phaseclusteringmethodforintelligentcustomersegmentation.In2010InternationalConferenceonIntelligentSystems,ModellingandSimulation,pages215–219.IEEE,2010.[17]J.Sinkkonen,S.Kaski,andJ.Nikkil¨a.Discriminative

clustering:Optimalcontingencytablesbylearningmetrics.MachineLearning:ECML2002,pages109–137,2002.[18]K.Wagstaff,C.Cardie,S.Rogers,andS.Schr¨odl.

Constrainedk-meansclusteringwithbackgroundknowledge.InProceedingsoftheInternationalConferenceonMachineLearning,2001.[19]M.WedelandW.Kamakura.Marketsegmentation:

conceptualandmethodologicalfoundations,volume8.Springer,2000.[20]L.Xu,J.Neufeld,B.Larson,andD.Schuurmans.

Maximummarginclustering.Advancesinneuralinformationprocessingsystems,17:1537–1544,2005.[21]J.Ye,Z.Zhao,andM.Wu.Discriminativek-means

forclustering.InAdvancesinNeuralInformationProcessingSystems,2007.[22]W.YuandG.Qiang.Customersegmentationofport

basedonthemulti-instancekernelk-aggregateclusteringalgorithm.InManagementScienceandEngineering,2007.ICMSE2007.InternationalConferenceon,pages210–215,2007.[23]B.Zhao,J.Kwok,andC.Zhang.Maximummargin

clusteringwithmultivariatelossfunction.InDataMining,2009.ICDM’09.NinthIEEEInternationalConferenceon,pages637–646.IEEE,2009.[24]B.Zhao,F.Wang,andC.Zhang.Efficientmulticlass

maximummarginclustering.InProceedingsofthe25thinternationalconferenceonMachinelearning,pages1248–1255.ACM,2008.

Acknowledgements

PartofthisworkwassupportedbytheGermanScienceFoundationunderthereferencenumberGA1615/1-1.

References

[1]N.Bansal,A.Blum,andS.Chawla.Correlation

clustering.MachineLearning,56(1-3):89–113,2004.[2]S.BickelandT.Scheffer.Multi-viewclustering.In

ProceedingsoftheIEEEinternationalconferenceondatamining.Citeseer,2004.[3]U.Brefeld,P.Geibel,andF.Wysotzki.Support

vectormachineswithexampledependentcosts.InProceedingsoftheEuropeanConferenceonMachineLearning,2003.[4]R.D’Andrade.U-statistichierarchicalclustering.

Psychometrika,4:58–67,1978.[5]F.DelaTorreandT.Kanade.Discriminativecluster

analysis.InProceedingsofthe23rdinternational

conferenceonMachinelearning,pages241–248.ACM,2006.[6]A.Dempster,N.Laird,andD.Rubin.Maximum

likelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),pages1–38,1977.[7]P.D’UrsoandL.D.Giovanni.Temporal

self-organizingmapsfortelecommunicationsmarketsegmentation.Neurocomput.,71:2880–2892,2008.[8]R.Gomes,A.Krause,andP.Perona.Discriminative

clusteringbyregularizedinformationmaximization.InAdvancesinNeuralInformationProcessingSystems,2010.[9]J.Huang,G.Tzeng,andC.Ong.Marketing

segmentationusingsupportvectorclustering.Expertsystemswithapplications,32(2):313–317,2007.

425

因篇幅问题不能全部显示,请点此查看更多更全内容