Entity Framework Core performance optimization

https://docs.microsoft.com/zh-cn/ef/core/performance/

Article Directory

1. Locating performance issues

1.1 PassLogTo method

Locate the generated sql statement and the execution time of sql.

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder
        .UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=Blogging;Integrated Security=True")
        .LogTo(Console.WriteLine, LogLevel.Information);
}
info: 06/12/2020 09:12:36.117 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [b].[Id], [b].[Name]
      FROM [Blogs] AS [b]
      WHERE [b].[Name] = N'foo'

This sql executed 4ms.

Note: It is not recommended to configure LogTo in a production environment. On the one hand, it will affect the running speed of the program, on the other hand, too many logs will take up space.

2.2 Associate SQL with LINQ queries

The above method even if you know the problem with sql, sometimes it is difficult to find which linq corresponds to. So you need to tag with linq:

var myLocation = new Point(1, 2);
var nearestPeople = (from f in context.People.TagWith("This is my spatial query!")
                     orderby f.Location.Distance(myLocation) descending
                     select f).Take(5).ToList();

The tag will appear in the generated sql:

-- This is my spatial query!

SELECT TOP(@__p_1) [p].[Id], [p].[Location]
FROM [People] AS [p]
ORDER BY [p].[Location].STDistance(@__myLocation_0) DESC

2. Solve the problem

After finding the problematic SQL, the next step is to analyze the execution process of this SQL. For sql server, when SSMS executes sql, it will generate an execution plan diagram of this sql, which can be further analyzed.

Insert picture description here

3. Perform Benchmark

If the same function can have multiple implementations, we can use the BenchmarkDotNet library for benchmark testing. Here is a benchmark test used to compare 4 implementations of an average calculation function:

  1. Load all entities, then calculate the average
  2. Load all entities but do not enable tracking (AsNoTracking), and then calculate the average
  3. Load only the columns to be calculated, and then calculate the average
  4. Let the database calculate the average

The implementation of the four methods is as follows:

//跟踪
[Benchmark]
public double LoadEntities()
{
    var sum = 0;
    var count = 0;
    using var ctx = new BloggingContext();
    foreach (var blog in ctx.Blogs)
    {
        sum += blog.Rating;
        count++;
    }
    return (double)sum / count;
}
//不跟踪
[Benchmark]
public double LoadEntitiesNoTracking()
{
    var sum = 0;
    var count = 0;
    using var ctx = new BloggingContext();
    foreach (var blog in ctx.Blogs.AsNoTracking())
    {
        sum += blog.Rating;
        count++;
    }

    return (double)sum / count;
}
//仅加载Rating列
[Benchmark]
public double ProjectOnlyRanking()
{
    var sum = 0;
    var count = 0;
    using var ctx = new BloggingContext();
    foreach (var rating in ctx.Blogs.Select(b => b.Rating))
    {
        sum += rating;
        count++;
    }

    return (double)sum / count;
}
//数据库进行计算
[Benchmark(Baseline = true)]
public double CalculateInDatabase()
{
    using var ctx = new BloggingContext();
    return ctx.Blogs.Average(b => b.Rating);
}

The results are as follows:

methodaverage valueerrorstandard deviationMedianratioRatioSDGeneration 01st generation2nd generationAssigned
LoadEntities2,860.4μs54.31μs93.68μs2844.5μs4.550.33210.937570.3125-1309.56 KB
LoadEntitiesNoTracking1353.0μs21.26μs18.85μs1355.6μs2.100.1487.89063.9063-540.09 KB
ProjectOnlyRanking910.9μs20.91μs61.65μs892.9μs1.460.1441.01560.9766-252.08 KB
CalculateInDatabase627.1μs14.58μs42.54μs626.4μs1.000.004.8828--33.27 KB
Note: BenchmarkDotNet is only a single-threaded operation method

4. Query optimization

4.1 Correct use of indexes

// Matches on start, so uses an index (on SQL Server)
var posts1 = context.Posts.Where(p => p.Title.StartsWith("A")).ToList();
// Matches on end, so does not use the index
var posts2 = context.Posts.Where(p => p.Title.EndsWith("A")).ToList();

Cooperate with database checking tools (such as the SSMS execution plan introduced earlier) to find out the index problem.

4.2 Check only the required attributes

If you only need the Id and Url attributes, don't check the others.

foreach (var blogName in context.Blogs.Select(b =>new{Id=b.Id,Url=b.Url}))
{
    Console.WriteLine("Blog: " + blogName.Url);
}

The generated sql is relatively short:

SELECT [b].[Id],[b].[Url]
FROM [Blogs] AS [b]

4.3 TakeThe number of items returned by the limit

var blogs25 = context.Posts
    .Where(p => p.Title.StartsWith("A"))
    .Take(25)
    .ToList();

4.4 Use split query to AsSplitQueryavoid Cartesian product explosion

AsSplitQueryIt should be Includeused together, and it will only take effect when the one-to-many relationship is inquired. Many-to-one and one-to-one will not be split.

4.5 Use Includeto preload

You can load the associated data in one SQL, avoid generating multiple SQL queries

using (var context = new BloggingContext())
{
    var filteredBlogs = context.Blogs
        .Include(
            blog => blog.Posts
                .Where(post => post.BlogId == 1)
                .OrderByDescending(post => post.Title)
                .Take(5))
        .ToList();
}

If the data of the associated entity is not currently needed, you can consider using "display loading" and "lazy loading".

4.6 Pay attention to the N+1 problem caused by "lazy loading"

For the following code, 4 pieces of sql will be generated, which are used to query the blog and query the posts corresponding to the three blogs. Also known as the N+1 problem

foreach (var blog in context.Blogs.ToList())
{
    foreach (var post in blog.Posts)
    {
        Console.WriteLine($"Blog {blog.Url}, Post: {post.Title}");
    }
}

Can be optimized to the following single sql query:

foreach (var blog in context.Blogs.Select(b => new { b.Url, b.Posts }).ToList())
{
    foreach (var post in blog.Posts)
    {
        Console.WriteLine($"Blog {blog.Url}, Post: {post.Title}");
    }
}

4.7 Buffering and streaming

Buffering: Load all query results into memory.

Streaming: Put a part of the result set into memory each time. Regardless of one query or 10,000, the occupied memory space is fixed. Consider using this method when querying large amounts of data.

// ToList 和 ToArray 将导致缓冲
var blogsList = context.Posts.Where(p => p.Title.StartsWith("A")).ToList();
var blogsArray = context.Posts.Where(p => p.Title.StartsWith("A")).ToArray();

// 流式:一次只处理一行数据
foreach (var blog in context.Posts.Where(p => p.Title.StartsWith("A")))
{
    // ...
}

// AsEnumerable 也属于流式
var doubleFilteredBlogs = context.Posts
    .Where(p => p.Title.StartsWith("A")) // 服务端和数据库执行
    .AsEnumerable()
    .Where(p => SomeDotNetMethod(p)); // 这段代码将在客户端执行

4.8 efcore's internal buffer

In addition to ToListand ToArrayare buffered, but when faced with the following two conditions, internal efcore will buffer some data.

  1. When the retry strategy is ready. This is to ensure that retrying the query can return the same results.
  2. When a split query is used, the sql query data except the last sql will be buffered. Unless sql server has MARS enabled.
Note: If you use ToList in your retry strategy, it will be buffered twice.

4.9 Use AsNoTrackingquery

When you don't care about the state of the data you query, you can use it to AsNoTrackingimprove efficiency.

4.10 Execute SQL directly

In some cases, your own sql may be better.

4.11 Using asynchronous methods

Try to use SaveChangesAsyncasynchronous methods such as . The synchronization method will block the thread during the database IO. When other requests come in, it may be necessary to open a new thread again, increasing the total number of threads and the number of thread switching times.

4.12 Try to use [Required]features physically

  1. The comparison of non-null is faster than the comparison of null, so try to fill in the attributes of the entity [Required].
  2. Try to check for equality ( ==) to avoid checking for inequality ( !=). Because ==sometimes it is not necessary to distinguish whether it is null.
  3. When querying, it is shown that certain attributes cannot be null. The second line of code below is faster than the first line:
var query1 = context.Entities.Where(e => e.String1 != e.String2 || e.String1.Length == e.String2.Length);
var query2 = context.Entities.Where(
    e => e.String1 != null && e.String2 != null && (e.String1 != e.String2 || e.String1.Length == e.String2.Length));

5. Update and optimization

5.1 Batching

var blog = context.Blogs.Single(b => b.Url == "http://someblog.microsoft.com");
blog.Url = "http://someotherblog.microsoft.com";
context.Add(new Blog { Url = "http://newblog1.microsoft.com" });
context.Add(new Blog { Url = "http://newblog2.microsoft.com" });
context.SaveChanges();

The above code will generate 4 SQL round trips: query, update, insert, and insert.

info: 2021/5/31 17:49:19.507 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (5ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT TOP(2) [b].[Id], [b].[Name], [b].[Url]
      FROM [Blogs] AS [b]
      WHERE [b].[Name] = N'fish'
info: 2021/5/31 17:49:19.517 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (1ms) [Parameters=[@p1='?' (DbType = Int64), @p0='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SET NOCOUNT ON;
      UPDATE [Blogs] SET [Url] = @p0
      WHERE [Id] = @p1;
      SELECT @@ROWCOUNT;
info: 2021/5/31 17:49:19.518 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (0ms) [Parameters=[@p0='?' (Size = 4000), @p1='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SET NOCOUNT ON;
      INSERT INTO [Blogs] ([Name], [Url])
      VALUES (@p0, @p1);
      SELECT [Id]
      FROM [Blogs]
      WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity();
info: 2021/5/31 17:49:19.520 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (0ms) [Parameters=[@p0='?' (Size = 4000), @p1='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SET NOCOUNT ON;
      INSERT INTO [Blogs] ([Name], [Url])
      VALUES (@p0, @p1);
      SELECT [Id]
      FROM [Blogs]
      WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity();

Whether efcore puts multiple SQL into a database round trip depends on the database provider used. For example, when it involves less than 4 or more than 40 statements, the efficiency of batch processing on sql server will be relatively low. So efcore will only execute 42 statements in a single batch by default, and then execute other statements in other round trips.

Of course, you can configure it manually, and enable batch processing when the statement is between 1-100:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseSqlServer(
        @"Server=(localdb)\mssqllocaldb;Database=Blogging;Integrated Security=True",
        o => o
            .MinBatchSize(1)
            .MaxBatchSize(100));
}

After the configuration, for the same operation above, you can see that there are only two round trips:

info: 2021/5/31 17:48:37.891 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT TOP(2) [b].[Id], [b].[Name], [b].[Url]
      FROM [Blogs] AS [b]
      WHERE [b].[Name] = N'fish'
info: 2021/5/31 17:48:37.909 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (7ms) [Parameters=[@p1='?' (DbType = Int64), @p0='?' (Size = 4000), @p2='?' (Size = 4000), @p3='?' (Size = 4000), @p4='?' (Size = 4000), @p5='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SET NOCOUNT ON;
      UPDATE [Blogs] SET [Url] = @p0
      WHERE [Id] = @p1;
      SELECT @@ROWCOUNT;

      DECLARE @inserted1 TABLE ([Id] bigint, [_Position] [int]);
      MERGE [Blogs] USING (
      VALUES (@p2, @p3, 0),
      (@p4, @p5, 1)) AS i ([Name], [Url], _Position) ON 1=0
      WHEN NOT MATCHED THEN
      INSERT ([Name], [Url])
      VALUES (i.[Name], i.[Url])
      OUTPUT INSERTED.[Id], i._Position
      INTO @inserted1;

      SELECT [t].[Id] FROM [Blogs] t
      INNER JOIN @inserted1 i ON ([t].[Id] = [i].[Id])
      ORDER BY [i].[_Position];

5.2 Bulk Update

The following code

foreach(var p in context.Posts)
{
    p.Content += "a";
}
context.SaveChanges();

Two database round trips occurred, and the last round trip was used to update data:

info: 2021/5/31 17:44:32.068 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [p].[Id], [p].[BlogId], [p].[Content], [p].[IsDeleted], [p].[Title]
      FROM [Posts] AS [p]
info: 2021/5/31 17:44:32.098 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (28ms) [Parameters=[@p1='?' (DbType = Int64), @p0='?' (Size = 4000), @p3='?' (DbType = Int64), @p2='?' (Size = 4000), @p5='?' (DbType = Int64), @p4='?' (Size = 4000), @p7='?' (DbType = Int64), @p6='?' (Size = 4000), @p9='?' (DbType = Int64), @p8='?' (Size = 4000), @p11='?' (DbType = Int64), @p10='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SET NOCOUNT ON;
      UPDATE [Posts] SET [Content] = @p0
      WHERE [Id] = @p1;
      SELECT @@ROWCOUNT;

      UPDATE [Posts] SET [Content] = @p2
      WHERE [Id] = @p3;
      SELECT @@ROWCOUNT;

      UPDATE [Posts] SET [Content] = @p4
      WHERE [Id] = @p5;
      SELECT @@ROWCOUNT;

      UPDATE [Posts] SET [Content] = @p6
      WHERE [Id] = @p7;
      SELECT @@ROWCOUNT;

      UPDATE [Posts] SET [Content] = @p8
      WHERE [Id] = @p9;
      SELECT @@ROWCOUNT;

      UPDATE [Posts] SET [Content] = @p10
      WHERE [Id] = @p11;
      SELECT @@ROWCOUNT;

Although efcore batched the last round trip, it was still not perfect. First, all the fields of the post were queried and tracked. Second, when saving, you need to compare with the snapshot to determine which attributes have changed, and generate sql for these changes. Unfortunately, efcore does not currently support Bulk Update, so we need to manually execute sql:

context.Database.ExecuteSqlRaw("UPDATE [Employees] SET [Salary] = [Salary] + 1000");

6. Other ways to optimize performance

6.1 Using the DbContextpool

ASP.NET Core constructor injection or Razor Pages constructor injection context is a new instance each time. By reusing the context instead of creating an instance of the context for each request, the throughput of the system can be improved. By AddDbContextPoolenabling the context pool:

services.AddDbContextPool<BloggingContext>(
    options => options.UseSqlServer(connectionString));

When requesting an instance, efcore first checks whether there are any available instances in the pool. After the request is processed, all states of the instance will be reset and then put into the pool. AddDbContextPoolThe poolSizeparameters that can be set control the maximum number of instances in the pool. When exceeded poolSize, the pool no longer caches new instances.

6.2 Query cache and parameterized query

efcore will first "compile" the linq query tree into the corresponding SQL statement. Because this process is relatively heavy, efcore will cache queries according to the shape of the query tree: queries with the same tree structure will reuse the internally cached compilation results.

Consider the following code:

var post1 = context.Posts.FirstOrDefault(p => p.Title == "post1");
var post2 = context.Posts.FirstOrDefault(p => p.Title == "post2");

Because the linq query tree contains different constants, the query tree is different. Each query will be compiled separately, and then generate the following nuanced SQL:

SELECT TOP(1) [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] = N'blog1'

SELECT TOP(1) [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] = N'blog2'

Because the sql is different, the query plan is also different.

But we can do the following optimizations:

var postTitle = "post1";
var post1 = context.Posts.FirstOrDefault(p => p.Title == postTitle);
postTitle = "post2";
var post2 = context.Posts.FirstOrDefault(p => p.Title == postTitle);

Now the constants have been parameterized, so the two query trees are the same, and efcore only needs to be compiled once. The generated sql is also parameterized, allowing the database to reuse the same query plan:

SELECT TOP(1) [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] = @__blogName_0
Note: It is not necessary to parameterize every query, and it is still ok to use constants to query. efcore will optimize, see the next chapter for details.
EF Core's event counter reports the query cache hit rate. In a normal application, this counter will reach 100% soon after the program is started, once most queries are executed at least once. If the value of this counter is lower than 100%, it means that the application may be performing an operation that is inconsistent with the query cache.
How the database manages the cached query plan depends on the database. For example, SQL Server implicitly maintains the LRU query plan cache, while PostgreSQL does not (but prepared statements may have a very similar end effect). Refer to the database documentation for more details.