C# – VSTO – Extract images from PowerPoint presentation

Share on:

Overview

Background

In one of my previous posts, I created a very basic VSTO example that adds a button to the PowerPoint ribbon.

Recently, I had a task I needed to enumerate all the pictures in the Power Point presentation and extract them into a zip file.

Power Point presentation might contain many different shapes, such as rectangles, lines, arrows, textboxes, pictures and more. Each shape might contain a text, but it might also contain a picture. 

In this post, I am going to show how to extract the images using 2 different techniques:

  • PowerPoint COM Interop API
  • Extract directly from ZIP file

My Stack

  • Visual Studio 2019 Community.
  • .NET Framework 4.7.2 / C#
  • Office 365, Desktop Edition.
  • Windows 10 Pro 64-bit (10.0, Build 19041)
  • PowerPoint Interop DLL version 15

Working with PowerPoint C# Interop version 15.0.0.0

Step 1 - Create a button

As shown in the previous example, I am adding a button element to the Ribbon XML. This button will have a callback set in the action attribute.

 1<customUI xmlns='http://schemas.microsoft.com/office/2009/07/customui'>
 2  <ribbon>
 3     <tabs>
 4      <tab id='sample_tab' label='GoTask'>
 5        <group id='sample_group' label='Operations'>
 6          <button id='extract_images' label='Extract Images' size='large' getImage='OnGetImage' onAction='OnExtractImage'/>
 7        </group>
 8      </tab>
 9    </tabs>
10  </ribbon>
11</customUI>

Step 2 - Collect the images from different shapes

PowerPoint presentation can store the images in a few shapes types. All the different shape types are represented by MsoShapeType enum. In order to recognize the Shape type, we are going to use Shape.Type and Shape.PlaceholderFormat.ContainedType properties: 

  • Picture - MsoShapeType.msoPicture or MsoShapeType.msoLinkedPicture
  • Picture contained in a placeholder MsoShapeType.msoPlaceholder
  • Other shapes that might have a PictureFormat property properly initialized.

In the sample presentation, I've created a few shapes that contain pictures in different formats.

In order to extract the image, I am going to use the PowerPoint Shape Exportfunction.

In order to choose a directory for saving the images, I am going to use the CommonOpenFileDialog implemented in Microsoft-WindowsAPICodePack-Shell. Here is the sample implementation of using a directory picker:

 1private string GetSaveDir()
 2{
 3  using (var dialog = new CommonOpenFileDialog())
 4  {
 5    dialog.IsFolderPicker = true;
 6
 7    var result = dialog.ShowDialog();
 8
 9    if (result == CommonFileDialogResult.Ok)
10    {
11      return dialog.FileName;
12    }
13  }
14
15  return null;
16}

The code below iterates over all slides in the presentation and extracts the images from the shapes.

Please note the following remarks:

  • The extracted images are in PNG format using the PpShapeFormat.ppShapeFormatPNG enum. You can specify JPG, BMP or other formats defined in the PpShapeFormat enum.

  • Pay attention for the shape.PictureFormat.CropBottom check. Generally, every shape has PictureFormat set to a non-null value. So we can't count on filtering out the shapes that have this property set to null. The trick is to try to access one of the properties (CropBottom or other). If the exception is thrown, we can skip the object (it's not a picture).

 1var i = 1;
 2foreach (Slide slide in app.ActivePresentation.Slides)
 3{
 4  foreach (Shape shape in slide.Shapes)
 5  {
 6    var doExport = false;
 7
 8    if (shape.Type == MsoShapeType.msoPicture ||
 9      shape.Type == MsoShapeType.msoLinkedPicture)
10    {
11      doExport = true;
12    }
13    else if (shape.Type == MsoShapeType.msoPlaceholder)
14    {
15      if (shape.PlaceholderFormat.ContainedType == MsoShapeType.msoPicture ||
16        shape.PlaceholderFormat.ContainedType == MsoShapeType.msoLinkedPicture)
17      {
18        doExport = true;
19      }
20    }
21    else
22    {
23      try
24      {
25        // this is just a dummy code. In case there is no picture in the
26        // shape, any attempt to read the CropBottom variable will throw 
27        // an exception
28        var test = shape.PictureFormat.CropBottom > -1;
29        doExport = true;
30      }
31      catch
32      {
33        doExport = false;
34      }
35    }
36
37    if(doExport) 
38      shape.Export(Path.Combine(saveDirectory, $"{i++}.png"), PpShapeFormat.ppShapeFormatPNG);
39  }
40}

When running this code on the presentation provided with the project, it should export 4 pictures to the chosen directory. (Picture's credit: Unsplash)

Working with ZIP file to extract the images

The pptx format is actually a zip file with a well formed structure defined in the Open-XML format. You could open the pptx file with any zip file extractor and look at it's contents. Fortunately, the pictures are stored in the ppt\media directory within the archive.

All I have to do now it to extract the archive and grab the images.

I am going to use the .NET ZipFile class located in System.IO.Compression namespace.

  1. Open the pptx file using ZipFile.Open
  2. Create a temporary temp_zip directory to extract the files to
  3. Copy the media files
  4. Delete the temporary temp_zip directory
 1private void ExtractWithZip(string pptxFile, string directory)
 2{
 3  var zipDir = "";
 4
 5  using (var arh = ZipFile.Open(pptxFile, ZipArchiveMode.Read))
 6  {
 7    zipDir = Path.Combine(directory, "temp_zip");
 8    Directory.CreateDirectory(zipDir);
 9    arh.ExtractToDirectory(zipDir); // extract
10
11        // iterate over files in the extracted dir.
12    foreach (var f in Directory.GetFiles(Path.Combine(zipDir, @"ppt\media")))
13      File.Copy(f, Path.Combine(directory, Path.GetFileName(f)));
14  }
15
16  // clean up
17  try
18  {
19    var dirToDelete = new DirectoryInfo(zipDir);
20    dirToDelete.Delete(true);
21  }
22  catch
23  {
24    //
25  }
26}

Useful resources

  • Source code of this project on GitHub