There are two options to store your content data of your applications on your Microsoft SharePoint environment.
This option stores the files uploaded to the site as binary content in content database tables. With this option active, default settings of SharePoint recycle bin will be in use when deleting content. These settings — described in details below — keeps the deleted content in content database and enables you to restore it even after 60 days (default value is 30 + 30 days).
RBS (Remote Blob Storage)
Remote Blob Storage is a whole other subject and we will focus on some of the problems we face on this subject. To learn more about RBS, I recommend reading this article: https://docs.microsoft.com/en-us/sharepoint/administration/rbs-overview
If I have to briefly explain, it is about storing binary BLOB data which is stored in content database by default outside of content database. In the case where RBS is used, default recycle bin in SharePoint works the same way, however in addition to the default setting of 30 + 30 days, garbage collection process of RBS must be performed. Otherwise, deleted SharePoint content will continue to take up space on the disk.
SharePoint Recycle Bin
In the digital age we are living in, completely losing deleted content and not being able to access it ever again is not an acceptable action of an application. That’s why the concept “soft delete” is available in SharePoint similar to almost any application. What SharePoint offers is more: recycle bin has two levels. We also have to differentiate SharePoint from other applications in terms of what it can store in the recycle bin, because a default application database stores only records (table rows) while SharePoint recycle bin has the ability to store lists, libraries and even larger objects such as sites in addition to records.
Users with rights to delete content have their own recycle bins, therefore deleted content will first be stored there. Default setting for storing deleted content in user’s recycle bin is 30 days which means this content and the its versions can be restored back by the user in 30 days. The default setting can be modified for each web application from the management console in Central Administration. In order to modify the setting, launch Central Administration application with an authorized user and follow the path Application Management > Manage Web Applications to access general settings for the web application.
After 30 days, the deleted content left in the user’s bin will be moved to second level bin: “Site Collection Recycle Bin”. From that point on, it is impossible for the user who deleted the content to operate on it anymore. There is another default 30-day waiting period in the site collection recycle bin. Any unprocessed content in this recycle bin will be removed completely after 30 days and it can only be restored via backup systems.
To give an example, imagine that we have a file named “Training.docx” in the document library in our SharePoint site. Once this content is deleted by an authorized user:
- Content will be stored in a format where user can restore it in the user’s recycle bin for 30 days. User may decide not to delete or do nothing.
- After 30 days passed, file will be moved to site collection recycle bin (second level). Default waiting time is 30 days and the content can be restored by site collection manager.
- After the second 30 days passed, content will be removed for good.
We can now move on to the problem that is the main subject of the article. In the system where RBS is used, RBS garbage collection process has to be performed in addition to recycle bin functionality we have discussed above. Unless this process is performed, deleted content will keep taking up space on your server disk system even if you can’t see it on the system. Having a few deleted files on your disk system may not be a big problem and you may not even notice it, however after mass file deletion (e.g. a deleted content of 500 GB) having your disk system keeping the content will not only bring inconvenience but also serious financial costs with it.
- Your backup and recovery systems will now backup unwanted 500 GB deleted content.
- You have to reconsider your capacity planning since there is a 500 GB of extra content on your data disk.
When I had this problem, 1.4 TB worth of content was deleted from a system where we stored 2 TB of data (Let’s say there were unnecessary images that served their purpose :)), but RBS disks were still taking up the same space! At this point, we decided to dig down deep and understand how the system worked.
In normal conditions, RBS has a “garbage collection” mechanism. This mechanism runs through these following 3 steps:
Reference Scan: Scans references in the system for the stored blob data.
Delete Propogation: In this second step, unneeded files are flagged to be deleted.
Orphan Cleanup: In this last step, files are checked if they are in the blob store but not in RBS tables.
Default waiting period for these 3 steps is also 30 days. With this information, we expect that a file deleted from SharePoint environment should be deleted by RBS cleanup system after 30 days it was removed from both recycle bins (where it was kept for 60 days). By default, BLOB structures with no reference to content database records will be removed after the waiting period has ended. You may face problems during the process and may have to manually fix them. We have experienced problems two times in the past and had to interfere manually following the steps below.
- Launch SQL Server Management Studio and open a new query screen. Select the corresponding content database and run the procedures below to reset 3-step rule of 30 days:
exec mssqlrbs.rbs_sp_set_config_value ‘garbage_collection_time_window’, ‘time 00:00:00’;
exec mssqlrbs.rbs_sp_set_config_value ‘delete_scan_period’, ‘time 00:00:00’;
exec mssqlrbs.rbs_sp_set_config_value ‘orphan_scan_period’, ‘time 00:00:00’;
- Open up a new query in “Run as Administrator” mode.
- Proceed to RBS Maintainer in the default RBS setup directory. Our application is on the following address by default:
C:\Program Files\Microsoft SQL Remote Blob Storage 10.50\Maintainer\Microsoft.Data.SqlRemoteBlobs.Maintainer.exe
Run the application with the parameters below for Maintainer.exe:
Microsoft.Data.SqlRemoteBlobs.Maintainer.exe -connectionstringname RBSMaintainerConnection -operation GarbageCollection ConsistencyCheck ConsistencyCheckForStores -GarbageCollectionPhases rdo -ConsistencyCheckMode r -TimeLimit 120
The most critical parameter is “connectionstringname”. connectionstring information is fetched from Maintainer.exe.config file and the connection string has to be encrypted.
Follow the below steps for encryption:
- Change the name of the file “Microsoft.Data.SqlRemoteBlobs.Maintainer.exe.config” to “web.config”.
- Go to directory “%windir%\Microsoft.NET\Framework64\v2.0.50727” in a command prompt in Run as Administrator mode and run the command below:
aspnet_regiis -pef connectionStrings “%programfiles%\Microsoft SQL Remote Blob Storage 10.50\Maintainer” -prov DataProtectionConfigurationProvider
- Run the command above after reverting the name of the file “web.config” back to “Microsoft.Data.SqlRemoteBlobs.Maintainer.exe.config”
- Processing of the command may take time depending on the amount of data. In my case, it took 55–65 minutes for opening 1.4 TB of space in a 2 TB of content database.
- Unfortunately, this is not the last step. After the process is done, run checkpoint in SQL Server Management Studio as described below if your database recovery model is “Simple”. If your recovery mode is “Full”, do not forget about the transaction logs as well.
As for the last step, run the command below (You may be required to run it twice):
USE <Content Database>;
EXEC sp_filestream_force_garbage_collection @dbname = N’<Content Database>’;
After all steps are done, you can now see that deleted files are also removed from your disk.