'how do read and write 15G txt file with 50 million record in asp core 6?

I want to read a 50milion record from 15G txt file and write in to elastic search

if (file.Length > 0)
        {
            string wwroot = _he.WebRootPath;
            string contentpath = _he.ContentRootPath;
            string path = Path.Combine(wwroot, "file/" + foldername);
            if (!Directory.Exists(path))
            {
                var rcheck = Directory.CreateDirectory(path);
            }
            var filename = file.FileName;
            var filepath = Path.Combine(path, filename);
            if (filepath.Any())
            {
                using (FileStream stream = new FileStream(Path.Combine(path, filename), FileMode.Create))
                {
                    file.CopyTo(stream);
                }
            }
            string[] lines = System.IO.File.ReadAllLines(filepath);
            var Plist = new List<Person>();
            int i = 0;
            foreach (var line in lines)
            {
                var newperson = new Person();
                string[] sub = line.Split(":");
                newperson.PId = sub[1];
                newperson.FirstName = sub[2];
                newperson.LastName = sub[3];
                newperson.Gender = sub[4];
                Plist.Add(newperson);
            }
        return View();

I can read and upload file but when in want to add to list I get error and only read 16000 items and my application is shutdown.



Solution 1:[1]

You need to read the file using a buffer. With a proper reading logic based on a buffer, you'll be able to read a file of any size.

This line here:

System.IO.File.ReadAllLines(filepath);

Reads ALL the content of 15 GB file at once, and attempts to put it all into memory. I don't know how your code managed to get past that line without throwing an OutOfMemoryException (reading "only" 4.62 GB file ate 19.2 GB of my memory when debugging).

Instead, use a buffer of a single line:

using var streamReader = File.OpenText(bigFilePath);
var fileLine = string.Empty;

while ((fileLine = streamReader.ReadLine()) != null)
{
    // Your string line reading logic.
}

You will most probably not be able to keep all the records in the memory (depending on memory available), also sending them one by one to Elasticsearch would be an opposite of efficiency... so, you'll need to find a middle ground between those limitations. I would suggest batching, that is, sending records in a fixed-size groups. The size is for you to pick, but note that it shouldn't be super large or minimal, otherwise the benefits of using batching will be smaller.

Full code:

static void Main()
{
    string wwroot = _he.WebRootPath;
    string contentpath = _he.ContentRootPath;
    string path = Path.Combine(wwroot, "file/" + foldername);
    var peopleListBatch = new List<Person>();
    const int BatchSize = 1024;

    using var streamReader = File.OpenText(path);
    var fileLine = string.Empty;

    while ((fileLine = streamReader.ReadLine()) != null)
    {
        var lineParts = fileLine.Split(":");
        var newperson = new Person
        {
            PId = lineParts[1],
            FirstName = lineParts[2],
            LastName = lineParts[3],
            Gender = lineParts[4],
        };

        peopleListBatch.Add(newperson);

        // Add to Elastic, but only when batch is full.
        if (peopleListBatch.Count == BatchSize)
        {
            AddPersonsToElasticSearch(peopleListBatch);
            peopleListBatch.Clear();
        }
    }

    // Add remaining people, if any.
    if (peopleListBatch.Count > 0)
    {
        AddPersonsToElasticSearch(peopleListBatch);
        peopleListBatch.Clear();
    }
}

Inserting to Elasticsearch is another story, and I leave that task to you:

static void AddPersonsToElasticSearch(List<Person> people)
{
    // TODO: Add your inserting logic here.
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Prolog