[Netdisk project log] 20210531: Seafile audit system development log

Remarks: When this blog was written, the system itself has been completed, and this blog is a supplementary file.

Article Directory

Book last time

At that time, I was autistic when I was engaged in Virus Scan (that is, I was banned for 10 consecutive hours), and then I saw that seafevents also included an audit system, so I got it by the way.

Front-end problem

The old problem is still restricted by isPro. I have said it countless times before, so I won't talk about it here.

Permissions issue

After letting go of the front-end interface, it was found that all interfaces had permission errors (403). Looking at the output of the web console, it was found that it was a 403 error, and the seahub backend also reported a corresponding error.

According urls.py the point of View, we saw the virus scan interface with exactly the same inside the IsProVersionclass limit, at a total of 5, all deleted.

permission_classes = (IsAdminUser, IsProVersion)

First debugging

Now the interface can be accessed, but after opening, all data is 0. I tried to log in as a user and create a file, but it is still 0. It seems that we can't deal with it hastily.

Research mechanism

After my research, publish the information I have learned:

  1. In addition to running timed tasks, seafevents can also accept "events" from seahub. For example, a user logged in, a user downloaded a file, and so on.
  2. The storage logic of the audit system in seafevents is (take uploading files as an example): the event system monitors, when a user uploads a file, the system will first store the relevant data information in a variable in the seafevents process, and then update it every 1 hour The contents of the variables are stored in the database in a centralized manner.
  3. The behaviors collected by seafevents: user login, user uploading files, downloading files, uploading files through public links, downloading files through public links, synchronizing client uploading files, synchronizing client downloading files.
  4. Behaviors related to user activity (Traffic): all upload and download behaviors, excluding user login

From the information we collected, it is not difficult to draw the following conclusions:

  1. To make the audit system run normally, the seafevents process must be running at all times.
  2. The audit system is similar to a set of data access with a Cache mechanism. This is because Seafile takes into account the performance problems caused by frequent insertion into the database. However, the audit data is not updated in real time due to the regular insertion of the system, and when the seafevents process exits unexpectedly, all its unsaved data will be lost.

Second debugging

In fact, I didn't feel much modification to its original mechanism, just confirmed the operation mode. However, during the debugging process, it was found that other data can be used normally after correct operation, but the statistics of the occupied space have not been able to work. Looking through the log, I found:

[2021-05-30 15:59:23,237] [WARNING] [TotalStorageCounter] Failed to get total storage occupation: OrgRepo

I am confused, but I found the code snippet that caused the problem:

try:
    RepoSize = SeafBase.classes.RepoSize
    VirtualRepo= SeafBase.classes.VirtualRepo
    OrgRepo = SeafBase.classes.OrgRepo

    q = self.seafdb_session.query(func.sum(RepoSize.size).label("size"),
                                  OrgRepo.org_id).outerjoin(VirtualRepo,\
                                  RepoSize.repo_id==VirtualRepo.repo_id).outerjoin(OrgRepo,\
                                  RepoSize.repo_id==OrgRepo.repo_id).filter(\
                                  VirtualRepo.repo_id == None).group_by(OrgRepo.org_id)
    results = q.all()
except Exception as e:
    self.seafdb_session.close()
    self.edb_session.close()
    logging.warning('[TotalStorageCounter] Failed to get total storage occupation: %s', e)
    return

I added some intermediate results and found the following statement:

OrgRepo = SeafBase.classes.OrgRepo

It was not executed normally and an exception was thrown. I have not seen this way of writing, and it is impossible to find things with IDEA direct positioning. But combined with the context, I infer that this statement is used to locate the specified data table in the database. But at the beginning I formed some mindsets, I went directly to the seahub database to find the corresponding table, but. . . Front RepoSizeand VirtualRepobe found. . . But the query statement below convinced me that it must correspond to the database.

After searching for a while, I found that it should be in the seafile database. . . Anyway, it should be considered a good solution, because the seafile database definition is also included in the pro package. So I directly added the following code to the running check of the C code:

sql = "CREATE TABLE IF NOT EXISTS OrgRepo ("
      "org_id int, "
      "repo_id char(36), "
      "user varchar(255), "
      "PRIMARY KEY (org_id,repo_id), "
      "UNIQUE (repo_id))"
      "ENGINE=INNODB;";
if (seaf_db_query (db, sql) < 0)
    return -1;

Compile and re-run, the data table will be created automatically, run the seafevents script again, the warning output of OrgRepo disappears. On the management interface of seahub, it is found that the usage is also displayed normally:

Insert picture description here

OK, the audit system is also complete.