"User Behavior Portrait" learning (chapters 1-4)

table of Contents

Chapter Overview

Data Sources

Portrait characteristics

Application field

Chapter Two Portrait Modeling

Two parts of user portrait

Chapter Three Group User Profile Analysis

Main process

User portrait similarity

User portrait clustering

Chapter Four User Portrait Management

Storage mechanism

Query mechanism

Regular update mechanism

Chapter Overview

User portraits: similar users are portrayed in different dimensions

User role: the role distinction of different users in the business system

User attributes: description and description of attributes, such as gender, age...

Data Sources

User attributes-static portrait

User behavior-dynamic portrait

Portrait characteristics

Limitations of time and space: In terms of time, the goal is to provide personalized services through accurate depiction; in terms of space: different fields have different focuses, so user portraits should be designed according to their own characteristics.

Application field

search engine

Recommended system

Business customization

Chapter Two Portrait Modeling

Portrait modeling is the labeling of user information

The core of user portrait modeling is to express and store the user's potential intentions and interests, and summarize the user model that can be read and calculated based on the user's basic information, video information, access information, behavior preferences, and implicit interests.

Two parts of user portrait

Qualitative portraits : basic user characteristics. Behavior description, interest model, video representation, etc.

Labels are the core of user qualitative portraits

Age tags, geographic tags... Semantic and short text are two important features of the presentation of tags. Semanticization enables people to understand these tags, and short texts can reduce preprocessing and facilitate computer extraction and aggregation analysis of tags.

Quantitative portraits : quantifiable data features such as user basic variables, interest preferences, etc.

Focus on the granularity of user portraits, the finer and more specific, the more detailed the higher the cost of modeling

The granularity should be moderate, and then use forms to capture user behavior, store and analyze data. Form is the most direct way to display collection

Two types of elements in the recommendation system : user and item

The construction of user qualitative portrait

Use the ontology to represent, verify, reason and interpret the tags in the user portrait domain.

Ontology includes classes, attributes, instances, axioms, and inference rules

The key steps of ontology construction

Build the domain vocabulary: that is, various tags

Determine the structure between categories and categories: determine the category relationship, such as video is divided into TV series, movies, variety shows..., hierarchical subdivision

Define attributes: object attributes and data attributes

Define instance: the entity, class and attribute are the "skeleton" of the ontology, and the instance is the "flesh and blood" of the ontology.

Defining constraint axioms and inference rules: constraints between class concepts

Chapter Three Group User Profile Analysis

When the recommendation system is designed, because of the large number of users, it is impossible to have a specific portrait for each user. User portraits should not only analyze target users, but also include correlation analysis between users, that is, group user portrait analysis. It portrays a group of real users, which can facilitate designers to discover the differentiated characteristics in the group, and provide targeted services based on the differentiation.

Main process

User portrait acquisition

User portrait similarity calculation, calculation of similarity according to different user portraits, distinguishing important indicators of user groups, and the premise of carrying out user portrait clustering

User portrait clustering. Clustering based on similarity

Group user portrait generation. Establish representative and typical user portraits for different types of users

User portrait similarity

Quantitative similarity calculation

It is often numeric, such as age, region, and other data. The similarity is calculated as:

Qualitative similarity calculation

It is indicated by a label, and there is no exact value. So its calculation method is as follows:

Method 1: Label quantification, convert the label concept into a quantitative value

Method 2: Calculation of similarity directly based on concepts

Based on the concept information capacity method : Determine the similarity of the concept through the information capacity of the common parent concept of two concepts.

Concept-based distance method

Use a large-scale corpus for statistics

Using ontology calculation [not easy to understand]

  • Conceptual semantic initial similarity level: the predetermined value of the concept similarity, which is also the similarity reflected by the concept's upper-lower relationship
  • Concept non-subordinate relationship similarity level: Calculate the similarity reflected by the non-subordinate relationship based on the initial similarity

The similarity is finally summed by weighting:

User portrait clustering

The distance calculation from different elements to the cluster center can be executed in parallel, so k-means can be MapReduceized to achieve high performance.

Chapter Four User Portrait Management

User portrait table entry form

Keyword Method

Scoring matrix

Vector space representation

Ontological representation

Storage mechanism

Relational Database

Two-dimensional table, row storage

Non-relational database

Key-value pair storage database: based on hash calculation, loose coupling, fast query. Redis

Column storage database. Responding to the massive data of distributed storage relationships, store it in units of columns in the relational database. HBase, Druid


Big data focuses on data filtering and statistics

Store by row If you need to count the total of a column, you need to scan the entire table to read all rows. If column storage is used, the data of a certain column will be organized together, that is, the data of the same attribute will be stored continuously; the disk IO will be reduced and the processing speed will be improved. When there is duplication, data compression can be performed to save storage.

Document database: Similar to the storage of key-value pairs, it can use different fields in the document to perform complex association operations, MongoDB

Graph database

The storage of user portraits is mainly based on column data and key-value databases


Is a subject-oriented, integrated, and time-varying data collection

Query mechanism

The most commonly used and most basic operation, the efficiency of querying id determines the performance of the recommender system to a large extent.

Concurrent query

Caching mechanism

Regular update mechanism

How to obtain real-time changing user portrait data

How to set appropriate user profile update starting conditions

Efficient update algorithm

Get user information

Static information: relatively stable

Dynamic Information

Update trigger condition

Set the threshold, according to the relationship between the amount of real-time image data obtained and the threshold

Set time period

First dig out the user portrait from the added data and compare it with the original portrait before deciding whether to update

Update mechanism

Complete update: computationally intensive and time-consuming

Incremental update: less calculation, time sliding window filtering algorithm. widely used

Continue to update...