Handling millions of attachments in JIRA


Posted by
Sean YONG

August 4, 2016

The question

Question: “Can JIRA handle tens of millions of attachment additions yearly?”

Answer: this has been a well-known question among  big corporations, and the answer is yes!

File systems

Before we dive deep into the topic, let’s look at what the question actually means.

You may well ask “what is a file system?” Well, a file system is a method or data structure that controls how data is stored, named and retrieved within an operating system on a disk or partition. In layman’s terms, it’s sort-of a way to organise files on a disk or partition. Without a file system, a system isn’t able to identify the start and end of a piece of information and as a result, you will be seeing a big chunk of open-ended data.

Most software operate on a files system – in fact, every storage device must be formatted with a dedicated file system, otherwise it will not function at all. For example, Windows, Unix-based operating systems and Mac OS all have file systems in which files are placed and stored in a hierarchical tree structure. This can be seen every day, everywhere on any computers with files stored in a hierarchy of directories or subdirectories.

Besides having files and folders stored in a tree structure of storage, file systems also manage the read and/or write access to a specific group of users with the help of passwords. Encryption is also one of the most commonly-used methods to prevent unauthorised user access. An encryption key is used to encrypt normal text or data so that the read and/or write access is applicable to the users who own the respective description key only.

1

Types of file systems

There are various types of file systems available today. Some of which are XFS, Ext4 (Extended File System), FAT (File Allocation Table), NTFS (New Technology File System) and APFS (Apple File System). However in this blog, we are going to discuss Ext4 and XFS in light of the most common Unix server deployments and implementations for big corporations. You may be wondering what the difference between Ext4 and XFS is – well, both of these file systems support large volume sizes. Here’s an overview comparison of Ext4 and XFS below, based on Ric Wheeler’s research:

2
  • Supports volume sizes up to 1 exbibyte (EiB) and file sizes up to 16 tebibytes (TiB).
  • Performs better than all of its predecessor (Ext2, Ext3) in general.
  • Handles file maintenance better (creates and/or removes a million files).
  • Performs slower when creating file systems due to the creation of static inode tables.
3
  • Supports volume sizes up to 16 exabytes (EB) and files sizes up to 8 exabytes (EB).
  • Creates file systems faster.
  • Poorer performance in terms of file maintenance (creates and/or removes a million files).
The exbibyte (EiB) is equal to 10246 (260) bytes and exabyte (EB) is equal to 10006 bytes. Therefore, 1 EiB is about 15% larger than 1 EB.

What’s the real showstopper here?

The backup and restore mechanism may not work well either on Ext4 or XFS when it comes to a bizarre number of subdirectories created by JIRA, especially during an exponential increase of issues. JIRA creates a unique subdirectory for every JIRA issue under the specific project directory – in other words, a JIRA project with a million issues with attachments must have a million subdirectories created under the project folder in JIRA-Home directory, aka the data directory.

Picture a situation where you need to generate a data backup for all sorts of testings and validations before rolling out permanent changes onto a production environment. A million subdirectories will become two million subdirectories, three million to six millions, six millions to twelve million and so forth. Basically, everything doubles during the backup phase and as a result, the performance of the server will eventually crumble and lead to a performance deterioration of your JIRA instance.

It’s worth noting that due to most file system limitations, this becomes a total showstopper – especially when you’re unable to upload further attachments because the JIRA-Home directory has reached its maximum number of subdirectories. This issue has been raised and tracked under JRA-19873 since 30th November 2009.

4

New attachment directory architecture

Thanks to the latest change in the attachment directory architecture, JIRA 7.0 has introduced a new level of subdirectories in the hierarchy. This level has been added between the project key subdirectory and the issue key subdirectory (see below) in order to limit the maximum number of subdirectories that JIRA can create in the JIRA-Home directory.

Directory 10000 20000 30000 40000 50000
Issue 1 ~ 10,000 10001 ~ 20,000 20001 ~ 30,000 30001 ~ 40,000 40001 ~ 50,000

The table above illustrates the attachment data location of every increment of 10,000 issues. A new subdirectory is created and named after the latest subdirectory number, with an increment of 10000 every time the directory has reached its maximum number of subdirectories (1,0000), or preferably when the 10001st issue of the subdirectory is created.

5

The denouement

There you have – yes, it is possible to handle tens of millions of attachments in JIRA from JIRA 7.0 onwards. If you’re currently using JIRA 7.0 or less, you should check out this Atlassian Upgrade Documentation for detailed steps on how to upgrade your JIRA. Please be advised that the attachment subdirectories will only be migrated to the new attachment directory architecture on the fly when users revisit the attachments of the existing JIRA issues (i.e. upon clicking the attachment of an existing JIRA issue, a new attachment thumbnail will be generated under a newly-created subdirectory named after the issue key). Any new attachments added to said issue thereafter will be stored in this new subdirectory.

Upgrade to JIRA 7.x now

What are you waiting for? Click here to find out more about the latest version of JIRA 7 and master attachments in JIRA!

  • Jeff Turner

    Thanks for documenting this. It’s a pity Atlassian couldn’t be bothered.

    For completeness, the code that computes the new path is found in FileAttachments.java:

    final long issueNumber = IssueKey.from(issueKey).getIssueNumber();
    final long issueBucket = ((issueNumber – 1) / BUCKET_SIZE + 1) * BUCKET_SIZE;
    return Long.toString(issueBucket);