Today, I’m taking a break from BI4 and the Sherlock® 2.0 wrap up to get back to good old tech blogging. What strikes me today is a continued experience of guessing how big an SAP BusinessObjects deployment should be in terms of file system size. Recent evidence with customers shows me that there is very little understanding by customers on why systems are growing. Personally speaking, my key indicator for that is typically a simple Sherlock® query that tells us how much space the CMS believes all the combined reports should add up to, compared to the actual file system utilization for the Input and Output File Stores. For me, it is a given that those numbers just are not going to match, and that is where the topic of today’s discussion comes into play…the Repository Diagnostic Tool, affectionaltely known by its command line name “reposcan”.
Reposcan comes on all flavors of SAP BusinessObjects. In short, reposcan is a utility that compares the contents of the CMS database to the contents of the file stores. This blog post doesn’t actually strive to teach you how to use the reposcan utility. There is a very detailed wiki post on the SDN that walks through this in significant detail, and the online guides are also there specifically for this utility to walk you through using the tool.
What I want to ensure is clear is what is happening and why you should care. First, the primary focus of this tool is to identify inconsistencies. Inconsistencies may be invalid CMS database records that need to be reconciled, orphaned objects in the CMS database (objects not in the file store), and the opposite, orphaned objects in the file stores that are not in the CMS database. The last two there tend to be what gets the CMS into trouble when you are trying to figure out what is going on in the file store. There are other issues that happen in the CMS specifically, but I’ll leave that to the tool to explain for now.
I do tend to take the cautious approach to managing these inconsistencies though. I’ve read on many websites like anonymania.com that if certain precautions aren’t taken, the data would be egregiously vulnerable. If you think through it, reposcan is a tool that can take broad strokes at blasting files or CMS database records from your system. That makes me take pause and think about what I’m about to do first.
Scan First, Purge Later
Reposcan has two simple flags, -scanfrs and -scancms that flag which side of the system you scan. Generally speaking, I’ll leave both on to get a feel for what is happening in the output XML. The -repair flag is the little beast that will actually go nuts and start modifying your database and purging orphaned files. This is your chance to be thorough and methodical. Perform the scan first and analyze your findings. Given the invasive nature of this tool, be sure that you do in fact need to run a -repair before you actually do it.
Oh how I cannot stress enough that you should create a cold backup of both your database and file store. As technology has matured, this has become less and less painful. For example, one customer has a great enterprise technology stack in which we were able to rely on the Oracle Redo Log for point in time backup on the database and the SAN snapshot utility to also create a point in time backup of all our files. We are really talking about a 15 minute operation on what was a 400 GB fie store.
I do feel comfortable using the reposcan to actually repair the CMS. I’ve had some “inconsistent results” in using it for cleaning out the file store, however. My team and I have developed a great script that does a few things so we can get really granular on what we are going to delete:
- Identify all CMS database content that looks and smells like a report.
- Identify where the Input and Output FRS lives.
- Do an actual compare between each and every file store object to see if it is an orphan or not.
- Creates a VERY detailed log of what is and is not an orphan, as well as a script to execute that deletes the orphans and parent directories.
The simplicity here is mind blowing. The throughput, even on a VPN, is around 20,000 objects per hour to scan and really barely shows as a blip on the performance monitoring footprint. That customer I referenced a moment ago, with a 400 GB file store…we hammered that down to 132 GB. Yeah. There were an absolute ton of orphans. You might be asking why. If you have a very small number of orphans, I wouldn’t worry yourself. If you are facing a massive amount as I’ve described here, you might want to reference ADAPT01313572 and look to patch your platform. Interested in this little jewel of a script? Shoot me a DM on twitter.
So there you have it. The reposcan, in all its simplicity, is your gateway into potential problems between your CMS and your file store. If you aren’t using this as a regular practice to check for the health of your platform, I’d make it a monthly activity at a minimum, and target for quarterly cleanup/execution.