table of Contents
User portraits: similar users are portrayed in different dimensions
User role: the role distinction of different users in the business system
User attributes: description and description of attributes, such as gender, age...
User attributes-static portrait
User behavior-dynamic portrait
Limitations of time and space: In terms of time, the goal is to provide personalized services through accurate depiction; in terms of space: different fields have different focuses, so user portraits should be designed according to their own characteristics.
Chapter Two Portrait Modeling
Portrait modeling is the labeling of user information
The core of user portrait modeling is to express and store the user's potential intentions and interests, and summarize the user model that can be read and calculated based on the user's basic information, video information, access information, behavior preferences, and implicit interests.
Two parts of user portrait
Qualitative portraits : basic user characteristics. Behavior description, interest model, video representation, etc.
Labels are the core of user qualitative portraits
Age tags, geographic tags... Semantic and short text are two important features of the presentation of tags. Semanticization enables people to understand these tags, and short texts can reduce preprocessing and facilitate computer extraction and aggregation analysis of tags.
Quantitative portraits : quantifiable data features such as user basic variables, interest preferences, etc.
Focus on the granularity of user portraits, the finer and more specific, the more detailed the higher the cost of modeling
The granularity should be moderate, and then use forms to capture user behavior, store and analyze data. Form is the most direct way to display collection
Two types of elements in the recommendation system : user and item
The construction of user qualitative portrait
Use the ontology to represent, verify, reason and interpret the tags in the user portrait domain.
Ontology includes classes, attributes, instances, axioms, and inference rules
The key steps of ontology construction
Build the domain vocabulary: that is, various tags
Determine the structure between categories and categories: determine the category relationship, such as video is divided into TV series, movies, variety shows..., hierarchical subdivision
Define attributes: object attributes and data attributes
Define instance: the entity, class and attribute are the "skeleton" of the ontology, and the instance is the "flesh and blood" of the ontology.
Defining constraint axioms and inference rules: constraints between class concepts
Chapter Three Group User Profile Analysis
When the recommendation system is designed, because of the large number of users, it is impossible to have a specific portrait for each user. User portraits should not only analyze target users, but also include correlation analysis between users, that is, group user portrait analysis. It portrays a group of real users, which can facilitate designers to discover the differentiated characteristics in the group, and provide targeted services based on the differentiation.
User portrait acquisition
User portrait similarity calculation, calculation of similarity according to different user portraits, distinguishing important indicators of user groups, and the premise of carrying out user portrait clustering
User portrait clustering. Clustering based on similarity
Group user portrait generation. Establish representative and typical user portraits for different types of users
User portrait similarity
Quantitative similarity calculation
It is often numeric, such as age, region, and other data. The similarity is calculated as:
Qualitative similarity calculation
It is indicated by a label, and there is no exact value. So its calculation method is as follows:
Method 1: Label quantification, convert the label concept into a quantitative value
Method 2: Calculation of similarity directly based on concepts
Based on the concept information capacity method : Determine the similarity of the concept through the information capacity of the common parent concept of two concepts.
Concept-based distance method
Use a large-scale corpus for statistics
Using ontology calculation [not easy to understand]
- Conceptual semantic initial similarity level: the predetermined value of the concept similarity, which is also the similarity reflected by the concept's upper-lower relationship
- Concept non-subordinate relationship similarity level: Calculate the similarity reflected by the non-subordinate relationship based on the initial similarity
The similarity is finally summed by weighting:
User portrait clustering
The distance calculation from different elements to the cluster center can be executed in parallel, so k-means can be MapReduceized to achieve high performance.
Chapter Four User Portrait Management
User portrait table entry form
Vector space representation
Two-dimensional table, row storage
Key-value pair storage database: based on hash calculation, loose coupling, fast query. Redis
Column storage database. Responding to the massive data of distributed storage relationships, store it in units of columns in the relational database. HBase, Druid
Big data focuses on data filtering and statistics
Store by row If you need to count the total of a column, you need to scan the entire table to read all rows. If column storage is used, the data of a certain column will be organized together, that is, the data of the same attribute will be stored continuously; the disk IO will be reduced and the processing speed will be improved. When there is duplication, data compression can be performed to save storage.
Document database: Similar to the storage of key-value pairs, it can use different fields in the document to perform complex association operations, MongoDB
The storage of user portraits is mainly based on column data and key-value databases
Is a subject-oriented, integrated, and time-varying data collection
The most commonly used and most basic operation, the efficiency of querying id determines the performance of the recommender system to a large extent.
Regular update mechanism
How to obtain real-time changing user portrait data
How to set appropriate user profile update starting conditions
Efficient update algorithm
Get user information
Static information: relatively stable
Update trigger condition
Set the threshold, according to the relationship between the amount of real-time image data obtained and the threshold
Set time period
First dig out the user portrait from the added data and compare it with the original portrait before deciding whether to update
Complete update: computationally intensive and time-consuming
Incremental update: less calculation, time sliding window filtering algorithm. widely used
Continue to update...