C# – VSTO – Extract images from PowerPoint presentation
Overview
Background
In one of my previous posts, I created a very basic VSTO example that adds a button to the PowerPoint ribbon.
Recently, I had a task I needed to enumerate all the pictures in the Power Point presentation and extract them into a zip file.
Power Point presentation might contain many different shapes, such as rectangles, lines, arrows, textboxes, pictures and more. Each shape might contain a text, but it might also contain a picture.
In this post, I am going to show how to extract the images using 2 different techniques:
- PowerPoint COM Interop API
- Extract directly from ZIP file
My Stack
- Visual Studio 2019 Community.
- .NET Framework 4.7.2 / C#
- Office 365, Desktop Edition.
- Windows 10 Pro 64-bit (10.0, Build 19041)
- PowerPoint Interop DLL version 15
Working with PowerPoint C# Interop version 15.0.0.0
Step 1 - Create a button
As shown in the previous example, I am adding a button element to the Ribbon XML. This button will have a callback set in the action attribute.
1<customUI xmlns='http://schemas.microsoft.com/office/2009/07/customui'>
2 <ribbon>
3 <tabs>
4 <tab id='sample_tab' label='GoTask'>
5 <group id='sample_group' label='Operations'>
6 <button id='extract_images' label='Extract Images' size='large' getImage='OnGetImage' onAction='OnExtractImage'/>
7 </group>
8 </tab>
9 </tabs>
10 </ribbon>
11</customUI>
Step 2 - Collect the images from different shapes
PowerPoint presentation can store the images in a few shapes types. All the different shape types are represented by MsoShapeType enum. In order to recognize the Shape type, we are going to use Shape.Type and Shape.PlaceholderFormat.ContainedType properties:
- Picture -
MsoShapeType.msoPicture
orMsoShapeType.msoLinkedPicture
- Picture contained in a placeholder
MsoShapeType.msoPlaceholder
- Other shapes that might have a PictureFormat property properly initialized.
In the sample presentation, I've created a few shapes that contain pictures in different formats.
In order to extract the image, I am going to use the PowerPoint Shape Exportfunction.
In order to choose a directory for saving the images, I am going to use the CommonOpenFileDialog implemented in Microsoft-WindowsAPICodePack-Shell. Here is the sample implementation of using a directory picker:
1private string GetSaveDir()
2{
3 using (var dialog = new CommonOpenFileDialog())
4 {
5 dialog.IsFolderPicker = true;
6
7 var result = dialog.ShowDialog();
8
9 if (result == CommonFileDialogResult.Ok)
10 {
11 return dialog.FileName;
12 }
13 }
14
15 return null;
16}
The code below iterates over all slides in the presentation and extracts the images from the shapes.
Please note the following remarks:
-
The extracted images are in PNG format using the
PpShapeFormat.ppShapeFormatPNG
enum. You can specify JPG, BMP or other formats defined in thePpShapeFormat
enum. -
Pay attention for the
shape.PictureFormat.CropBottom
check. Generally, every shape hasPictureFormat
set to a non-null value. So we can't count on filtering out the shapes that have this property set to null. The trick is to try to access one of the properties (CropBottom or other). If the exception is thrown, we can skip the object (it's not a picture).
1var i = 1;
2foreach (Slide slide in app.ActivePresentation.Slides)
3{
4 foreach (Shape shape in slide.Shapes)
5 {
6 var doExport = false;
7
8 if (shape.Type == MsoShapeType.msoPicture ||
9 shape.Type == MsoShapeType.msoLinkedPicture)
10 {
11 doExport = true;
12 }
13 else if (shape.Type == MsoShapeType.msoPlaceholder)
14 {
15 if (shape.PlaceholderFormat.ContainedType == MsoShapeType.msoPicture ||
16 shape.PlaceholderFormat.ContainedType == MsoShapeType.msoLinkedPicture)
17 {
18 doExport = true;
19 }
20 }
21 else
22 {
23 try
24 {
25 // this is just a dummy code. In case there is no picture in the
26 // shape, any attempt to read the CropBottom variable will throw
27 // an exception
28 var test = shape.PictureFormat.CropBottom > -1;
29 doExport = true;
30 }
31 catch
32 {
33 doExport = false;
34 }
35 }
36
37 if(doExport)
38 shape.Export(Path.Combine(saveDirectory, $"{i++}.png"), PpShapeFormat.ppShapeFormatPNG);
39 }
40}
When running this code on the presentation provided with the project, it should export 4 pictures to the chosen directory. (Picture's credit: Unsplash)
Working with ZIP file to extract the images
The pptx format is actually a zip file with a well formed structure defined in the Open-XML format. You could open the pptx file with any zip file extractor and look at it's contents. Fortunately, the pictures are stored in the ppt\media
directory within the archive.
All I have to do now it to extract the archive and grab the images.
I am going to use the .NET ZipFile class located in System.IO.Compression
namespace.
- Open the pptx file using
ZipFile.Open
- Create a temporary
temp_zip
directory to extract the files to - Copy the media files
- Delete the temporary
temp_zip
directory
1private void ExtractWithZip(string pptxFile, string directory)
2{
3 var zipDir = "";
4
5 using (var arh = ZipFile.Open(pptxFile, ZipArchiveMode.Read))
6 {
7 zipDir = Path.Combine(directory, "temp_zip");
8 Directory.CreateDirectory(zipDir);
9 arh.ExtractToDirectory(zipDir); // extract
10
11 // iterate over files in the extracted dir.
12 foreach (var f in Directory.GetFiles(Path.Combine(zipDir, @"ppt\media")))
13 File.Copy(f, Path.Combine(directory, Path.GetFileName(f)));
14 }
15
16 // clean up
17 try
18 {
19 var dirToDelete = new DirectoryInfo(zipDir);
20 dirToDelete.Delete(true);
21 }
22 catch
23 {
24 //
25 }
26}
Useful resources
- Source code of this project on GitHub