After spending a few minutes to analyze the CSDN rankings in various fields, I got this result

I found amazing secrets from the CSDN rankings, come in and have a look~

This article tries not to use the obscure and complicated technical methods, just use simple front-end knowledge and related APIs, cooperate with the browser to obtain the data we want, and analyze it.
I hope this article can make the technical experts smile and think about yourself. Was there a similar "wonderful" research back then?
Maybe this article will evoke your purest original intention when you started working in technology.

I. Introduction

CSDN is the first technical community I came into contact with when I was in university. Recently, it added a ranking section, which subdivided more than a dozen categories, and calculated the popularity of each article from interaction, likes, comments, collections, and time latitude. Put them into the corresponding categories, and finally converge into a category list and a comprehensive list.

The community is vigorously encouraging everyone to share and precipitate technology to promote the development of the entire technology circle. This article was also born because of one of my thoughts when browsing the rankings, that is, from the user's perspective:

  • Count the title of the article + author + the list of each list
  • Integrate the aggregated list data and make a new "comprehensive field list"
  • Count the list of comprehensive fields to see if there is a wealth password, haha

Using Internet slang should look like this:

  1. Find a hand in the segment
  2. Form the methodology and export to the outside world
  3. Analyze your own pain points and do refined operations
Insert picture description here

2. Get started

The following code can be run directly in the browser console, and we try to start from the user's operation to avoid adding difficulty to the entire implementation.

2.1 What can I get for a list content

Ideas:

  1. Select a single content
  2. Get the title, author, and popularity in the content

Code:

// 选中单条内容
const hotRankItem = document.querySelector('.hostitem');
// 获取标题
hotRankItem.querySelector('a.title').innerText;
// 获取作者
hotRankItem.querySelector('a.name').innerText;
// 获取热度
hotRankItem.querySelector('span.num').innerText;

result:

Insert picture description here

2.2 Get the complete data of a list

Ideas:

  1. Get the content list and convert it into an array
  2. Traverse to get the required content and return an array

Code:

function getData() {
    const likeArray = (arr) => Array.prototype.slice.call(arr)
    // 获取列表数组
    const hotRankList = likeArray(document.querySelectorAll('.hostitem'));
    // 处理榜单内所有数据
    const newHotRankList = hotRankList.map((element, index) => {
        const title = element.querySelector('a.title').innerText;
        const name = element.querySelector('a.name').innerText;
        const hot = element.querySelector('span.num').innerText;
        return {title, name, hot}
    })
    return newHotRankList;
}

getData()

result:

Insert picture description here

note

Friends who often browse the leaderboard may realize that for performance and experience considerations, only when the user reaches the bottom of the page will the follow-up content of the leaderboard be updated, adding 25 items each time. So we are like a real user, let the code help us to realize the browsing and reading behavior~

2.3 Simulate user swipe to get the complete list

Ideas:

  1. With the help of window.scrollTo(x, y)scrolling the content to the specified coordinates.
  2. It is known that 25 pieces of data are loaded each time, a total of 50 pieces of data in the list, and a total of 100 pieces of data in the comprehensive list
  3. Use callback functions to determine when to process data

Code:

function getData() {
    const likeArray = (arr) => Array.prototype.slice.call(arr)
    // 获取列表数组
    const hotRankList = likeArray(document.querySelectorAll('.hostitem'));
    // 处理榜单内所有数据
    const newHotRankList = hotRankList.map((element, index) => {
        const title = element.querySelector('a.title').innerText;
        const name = element.querySelector('a.name').innerText;
        const hot = element.querySelector('span.num').innerText;
        return {title, name, hot}
    })
    return newHotRankList;
}

function getComData(){
    let mainHeight = 0; // 内容高度
    let timer = setInterval(() => {
        let height = document.querySelector('.main').offsetHeight
        if (mainHeight === height) {
            const res = getData();
            console.log(res)
            clearInterval(timer);
        } else {
            mainHeight = height
            window.scrollTo({
                top: 100000,
                behavior: "smooth"
            })
        }
    }, 1000)
}

getComData();

result:

Insert picture description here

2.4 Get data on content rankings in all fields

All subsequent examples will execute the code in this section first by default

Similarly, if I am a user and I want to see all the lists, my operation should be like this

  1. Browse the C/C++ list and draw to the bottom (reading finished)
  2. Click Java category, continue to step 1->step 2
  3. Note: At this time, Promise is used to obtain the complete list
Insert picture description here

Ideas:

  1. Loop to simulate user clicks
  2. Enter the new list and repeat the operation in section 2.3
  3. Summarize the results of all the lists and get the name of the list for distinction
const likeArray = (arr) => Array.prototype.slice.call(arr)
function getData() {
    // 获取列表数组
    const hotRankList = likeArray(document.querySelectorAll('.hostitem'));
    const curBlogRankName = document.querySelector('.blog-rank-right-top li.active').innerText;
    console.log(`正在拉去${curBlogRankName}榜单数据`)
    // 处理榜单内所有数据
    const newHotRankList = hotRankList.map((element, index) => {
        const title = element.querySelector('a.title').innerText;
        const name = element.querySelector('a.name').innerText;
        const hot = element.querySelector('span.num').innerText;
        return {title, name, hot, curBlogRankName}
    })
    return newHotRankList
}

function getComData(){
    let mainHeight = 0; // 内容高度
    return new Promise((resolve, reject)=>{
        let timer = setInterval(() => {
            let height = document.querySelector('.main').offsetHeight
            if (mainHeight === height) {
                const res = getData();
                resolve(res);
                clearInterval(timer);
            } else {
                mainHeight = height;
                window.scrollTo({
                    top: 100000,
                    behavior: "smooth"
                })
            }
        }, 1000)
    })
}


// 获取列表榜单
async function loopNav(curList) {
    const result = {}
    let i = 0;
    const navList = likeArray(document.querySelectorAll('.blog-rank-right-top ul li'));
    for(let i = 0; i < navList.length; i++){
        navList[i].click();
        const res = await getComData();
        result[navList[i].innerText] = res;
    }
    console.log('榜单结果汇总:', result)
    window.rankResult = result; // 将结果挂到window对象上
}

loopNav();

result:

Insert picture description here

2.5 Get a single list data collection

If you want to get the content of a certain list:

For example: on the CSDN site-wide comprehensive hot list page, open the console and execute the code in section 2.3

You can get the data of the array type.

Three, get the data and start to work

The data we get should look like this, and we need to process it according to the ideas in Chapter 1.

集合:{
    榜单1:[
        {文章1}, {文章1},
    ],
    榜单1:[
        {文章1}, {文章1},
    ]
}
Someone may ask, why not look at the comprehensive hot list directly, this is because the update time of the single list and the hot list are different, in fact, you can choose one of them to study in depth.

Ideas:

  1. All lists merged
  2. Sort by popularity
  3. Intercept the top 100

Code:

window.rankResult100 = Object.keys(rankResult)
    .reduce((prev, next) => prev.concat(rankResult[next]), [])
    .sort((a, b) => b.hot - a.hot)
    .slice(0, 100)
console.log('热度前100名(每日更新):', window.rankResult100)

result:

Insert picture description here

As you can see from emmm, at the top of the list is the operation and maintenance list- <Programming Ape is Xiaohe>'s article.

3.2 Count the domain modules that appear in the top 100

There are many ways to achieve this requirement, don’t worry about the ways~

Code:

window.rankResult100 = Object.keys(rankResult)
    .reduce((prev, next) => prev.concat(rankResult[next]), [])
    .sort((a, b) => b.hot - a.hot)
    .slice(0, 100)
console.log('热度前100名(每日更新):', window.rankResult100)

// 统计榜单进去前100的数量
window.rankResultGroup = {};
// 获取榜单名称并生成Key
const navList = Array.prototype.slice.call(document.querySelectorAll('.blog-rank-right-top li'))
    .map(item => item.innerText)
    .forEach(item => window.rankResultGroup[item] = 0)
// 统计数量
window.rankResult100.forEach(item => window.rankResultGroup[item.curBlogRankName]++);
console.log('各榜单拥有前100数量:', window.rankResultGroup);

result:

Insert picture description here

3.3 The “analysis” conclusion of the list-the high module ranking share has a certain relationship with the language usage rate

No matter how you look at it, it looks like a nonsense

Many people use:

  • I have more problems
  • There are more people who solve the problem
  • More technology precipitation

Although I am not a professional in data analysis, I just checked the popularity rankings of programming languages ​​in Zhihu. According to the announcement on TIOBE 2020年9月的编程语言排行榜, you can see that the top three are C, Java and Python, as shown in the figure:

Insert picture description here

Compare the final result we got:

Insert picture description here

Hmm, not much difference.

Insert picture description here

3.4 Summary

At this point, we have completed:

  1. Get list data
  2. Integrate list data
  3. Count several ranked articles
  4. Count the number of single-list players in the 100th comprehensive list
  5. Simple analysis (accepting advice humbly)

And that makes me click, dom操作, 数组操作and so deepened the impression, hoping to read the students here can also be harvested, of course, the above code must have a place for improvement, please leave a message you big brother interactive oh ~

What else can I do afterwards? I look forward to your participation and interaction with me and other people in their own fields, such as:

  1. Generate two-dimensional and three-dimensional data models;
  2. Analyze the number of fans of the authors on the list to further obtain useful information for you;
  3. Get the release time of the list articles, the release time rule of the big stats

Maybe a secret shortcut is in the things you are interested in and are exploring, come on!

Insert picture description here

Four, write at the end

This article is a combination of technology and living article, I'll define it as something interesting, it will be synchronized in the "front-end corner of the Advanced Notes" in

Students who have read here may wish to do a little thing and order a star for my GitHub repository ! Thanks♪(・ω・)ノ

about me

Other precipitation