Processing Lots of Files in C#

4346439556

Elliot came to see me today with a need to process a whole bunch of files on a disk. I quite enjoy playing with code and so we spent a few minutes building a framework which would work through a directory tree and allow him to work on each file in turn. Then I thought it was worth blogging, and here we are.

Finding all the files in a directory

The first thing you want to do is find all the files in a directory. Suppose we put the path to the directory into a string:

string startPath = @"c:\users\Rob\Documents";

Note that I’ve used the special version of string literal with the @ in front. This is so my string can contain escape characters (in this case the backslash character) without them being interpreted as part of control sequences. I want to actually use backslash (\) without taking unwanted newlines (\n)

I can find all the files in that directory by using the Directory.GetFiles method, which is in the System.IO namespace. It returns an array of strings with all the filenames in it.

string [] filenames = Directory.GetFiles(startPath);
for (int i = 0; i < filenames.Length; i++)
{
   Console.WriteLine("File : " + filenames[i]);
}

This lump of C# will print out the names of all the files in the startPath directory. So now Elliot can work on each file in turn.

Finding all the Directories in a Directory

Unfortunately my lovely solution doesn’t actually do all that we want. It will pull out all the files in a directory, but we also want to work on the content of the directories in that directory too. It turns out that getting all the directories in a directory is actually very easy too. You use the Directory.GetDirectories method:

string [] directories =
          Directory.GetDirectories(startPath);
for (int i = 0; i < directories.Length; i++)
{
    Console.WriteLine("Directory : " + directories[i]);
}

This lump of C# will print out all the directories in the path that was supplied.

Processing a Whole Directory Tree

I can make a method which will process all the files in a directory tree. This could be version 1.0

static void ProcessFiles(string startPath)
{
   Console.WriteLine("Processing: " + startPath); 
   string [] filenames = Directory.GetFiles(startPath); 
   for (int i = 0; i < filenames.Length; i++)
   {
      // This is where we process the files themselves
      Console.WriteLine("Processing: " + filenames[i]); 
   }
}

I can use it by calling it with a path to work on:

ProcessFiles(@"c:\users\Rob\Documents");

This would work through all the files in my Documents directory. Now I need to improve the method to make it work through an entire directory tree. It turns out that this is really easy too. We can use recursion.

Recursive solutions appear when we define a solution in terms of itself. In this situation we say things like: “To process a directory we must process all the directories in it”.  From a programming perspective recursion is where a method calls itself.  We want to make ProcessFiles call itself for every directory in the start path.

static void ProcessFiles(string startPath)
{
  Console.WriteLine("Processing: " + startPath); 

  string [] directories = 
                  Directory.GetDirectories(startPath); 
  for (int i = 0; i < directories.Length; i++)
  {
    ProcessFiles(directories[i]);
  }

  string [] filenames = Directory.GetFiles(startPath); 
  for (int i = 0; i < filenames.Length; i++)
  { 
    Console.WriteLine("Processing : " + filenames[i]); 
  }
}

The clever, recursive, bit is in red. This uses the code we have already seen, gets a list of all the directory paths and then calls ProcessFiles (i.e. itself) to work on those. If you compile this method (remember to add using System.IO; to the top so that you can get hold of all these useful methods) you will find that it will print out all the files in all the directories.

Console Window Tip:  If you want to pause the listing as it whizzes past in the command window you can hold down CTRL and press S to stop the display, and CTRL+Q to resume it.