Seeding iCloud Core Data Store

Icloud core data syncAs I was developing Speech Timer 2 to synchronize over Core Data between the OS X and iOS versions, seeding the data was a big problem. The previous version of the application didn’t use iCloud moreover Core Data – it’s only a serialized property list file. Since it doesn’t need to sync, providing pre-populated data was quite simple: just check whether the property list file exists or not – if it doesn’t exist, create one and fill it with the initial data set. The re-write to use Core Data also followed the same approach – check whether the persistent storage exists or not;  if it doesn’t then insert some objects to use as a starting point by the user.

The Problem

My motivation for using Core Data in Speech Timer 2 so that it can seamlessly synchronize changes between the iOS and OS X editions over iCloud. Carrying over the single property list approach would not achieve bi-directional synchronization and merging of records since iCloud will just transport the entire file, overwriting any potential intermediate changes. However bringing sync along also means that the original approach of providing a starting data set is no longer workable.

Because there is no centralized location that understands the data, there is no way to determine the first time an iCloud data store is initialized as the data store is being opened. Sure, you could check whether the data store has some data and base the decision on that, but your app running on another device could have done the same thing while being offline and then pre-populate its own data store. If this happens, you’ll have duplicated seed data.

The Solution

What you’ll need to do is to accept this fact and design your schema around it. You’ll need to always have a way to de-duplicate seeded entries and run this process as part of your merge algorithm. That is, whenever an instance of your application receives fresh changes from iCloud, try to see whether they are seed data from another device and remove any duplicates of these.

Furthermore you’ll need to be able to tell seeded data records apart from the ones entered by the user. Take extra care not to delete user-entered data or otherwise face the wrath of your users, increased support load, and worse – abandonment of your app. Your de-duplication algorithm should not touch any data records that are not inserted automatically by the application as part of its seeding process. This could be a simple boolean attribute attached to your data records, or a more elaborate combination of attributes that you’ll see in my example later in this post.

Last but not least, you should only initialize data once per device per account. This is to minimize the chance of your users seeing duplicate preloaded data in your application’s user interface. Remember that duplicates are inevitable and there could be a chance where your users see duplicated data show up in the app’s screen – no mater how slim the chance is or how brief they may show up before your cleanup algorithm removes them. Hence you don’t want to seed data any more often than you should.

An Example

As it stands, Speech Timer provides a number of pre-set speech types (in this application’s domain, a speech type is like a category but are not coupled within any master-detail record hierarchy). Moreover, the user is free to modify or even remove these seeded data rows. Once the user has modified a predefined speech type, the record now “belongs” to the user and considered equivalent as if the user had entered the item himself.

Identifying system-created records

I solved this problem by having createDate and updateDate fields on each speech type. System-seeded data will have createDate initialized but updateDate as nil whereas user-entered data will have both attributes filled. This is accomplished by initializing createDate on awakeFromInsert and updating updateDate on willSave – but only when the record was updated in the current transaction. One caveat is that if willSave made any changes to an attribute during a save process, it will be called again until there are no attributes changed. So I had do a little dance between willSave and didSave and store the last update date into a transient instance variable.

-(void)awakeFromInsert
{
    [super awakeFromInsert];
    // ... other data setup here ...

    // initialize createDate to the current timestamp.
    self.primitiveCreateDate = [NSDate date];
}


-(void)willSave
{
    [super willSave];
    if ([self isUpdated] && ![self isDeleted]) {
        // new records will always have updated date to nil.
        if (![_lastUpdateDate isEqualToDate:self.updateDate]) {
            if (!_lastUpdateDate) {
                _lastUpdateDate = [NSDate date];
            }
            self.updateDate = _lastUpdateDate;
        }
    }
}

-(void)didSave
{
    [super didSave];
    // reset last update variable for the next save cycle.
    _lastUpdateDate = nil;
}

De-duplicating records

Speech type records are primarily identified by its name. Hence it’s logical to use it as the primary attribute for de-duplicating. If there are duplicates, the application will only keep the latest seeded data. Hence, should a more recent version of the application seeded the same speech type name with other attributes being different, the newer seeded data will be retained. As always, de-duplication only works on speech types that are not modified by the user.

-(void) cleanupSpeechTypes
{
    NSManagedObjectContext* managedObjectContext = self.managedObjectContext;
    
    static NSArray* sortDescriptors;
    static NSPredicate* predicate;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        // use speechTypeName as the key attribute to identify duplicates and createDate to resolve them.
        sortDescriptors = @[
            [NSSortDescriptor sortDescriptorWithKey:BSSpeechTimerModelAttributes.speechTypeName ascending:YES],
            [NSSortDescriptor sortDescriptorWithKey:BSSpeechTypeAttributes.createDate ascending:YES]
        ];
        // check for createdDate filled but no updateDate – these are system-created records that are not touched by the user
        predicate  = [NSPredicate predicateWithFormat:@"createDate != nil && updateDate == nil"];
    });
    NSFetchRequest* fetch = [NSFetchRequest new];
    fetch.predicate = predicate;
    fetch.sortDescriptors = sortDescriptors;
    fetch.entity = [BSSpeechType entityInManagedObjectContext:managedObjectContext];
    
    BSSpeechType* lastOne = nil;
    NSMutableSet* objectsToDelete = nil;
    NSArray* result = [managedObjectContext executeFetchRequest:fetch error:nil];
    for (BSSpeechType* current in result) {
        if (lastOne) {
            if (lastOne != current && [lastOne.speechTypeName isEqualToString:current.speechTypeName]) {
                // we found a newer one with exactly the same type name. Let's delete this one.
                if (!objectsToDelete) {
                    objectsToDelete = [NSMutableSet new];
                }
                [objectsToDelete addObject:lastOne];
            }
        }
        lastOne = current;
    }
    
    if (objectsToDelete) {
        for (NSManagedObject* obj in objectsToDelete) {
            [managedObjectContext deleteObject:obj];
        }
        // .. some other stuff to let the UI know that we've just deleted a few records ...
    }
}

The cleanup logic above is run primarily as a response of two notifications from iCloud:

  • When the data store just changed: NSPersistentStoreCoordinatorStoresDidChangeNotification
  • When new data have been merged from iCloud: NSPersistentStoreDidImportUbiquitousContentChangesNotification
// handler for BSCoreDataControllerDidMergeFromUbiquitousContentChanges
-(void) didMergeFromUbiquituousContentChanges:(NSNotification*) notification
{
    // a safety check to ensure that this notification is for _our_ core data stack
    NSManagedObjectContext* managedObjectContext = self.managedObjectContext;
    if ([notification.object managedObjectContext] != managedObjectContext) {
        return;
    }

    // cleanup duplicate seeded data if needed
    [self cleanupSpeechTypes];
    
    // ... determine if the UI needs to refresh some data and then raise notification as necessary ...
}

// handler for BSCoreDataControllerStoresDidChangeNotification
-(void) storesDidChange:(NSNotification*) notification
{
	// again, safety check
    if ([notification.object managedObjectContext] != self.managedObjectContext) {
        return;
    }
    NSNumber* transitionType = notification.userInfo[NSPersistentStoreUbiquitousTransitionTypeKey];
    switch ([transitionType unsignedIntegerValue]) {
        case NSPersistentStoreUbiquitousTransitionTypeAccountAdded:
            // new account, first see if wee need to seed data and then cleanup any duplicates.
            [self setupSpeechTypes]; 
            [self cleanupSpeechTypes];
            break;
        default:
            // otherwise just cleanup duplicates.
            [self cleanupSpeechTypes];
            break;
    }
    // ... again notify the UI if necessary ...    
}

Note that my Core Data manager class BSCoreDataController provide convenience notifications to handle those two above notifications at a higher level in your application logic. Whenever it receives NSPersistentStoreCoordinatorStoresDidChangeNotification in turn it will post BSCoreDataControllerStoresDidChangeNotification on the main queue. Likewise whenever it receives NSPersistentStoreDidImportUbiquitousContentChangesNotification, it will merge the new data first and then post BSCoreDataControllerDidMergeFromUbiquitousContentChanges in the main queue so that your application can perform data-specific logic with the changes. Remember that NSNotificationCenter calls your observers in the same thread as the one that posts the notifications. Through testing, I found that these iCloud notifications comes from secondary threads, making their handlers unable to directly manipulate the NSManagedObjectContext object that is being used by the user interface.

-(void) persistentStoreCoordinatorDidImportUbiquitousContentChanges:(NSNotification*) notification
{
    NSManagedObjectContext* moc = [self managedObjectContext];
    [moc performBlock:^{
        [moc mergeChangesFromContextDidSaveNotification:notification];
        [[NSNotificationCenter defaultCenter] postNotificationName:BSCoreDataControllerDidMergeFromUbiquitousContentChanges object:self userInfo:notification.userInfo];
    }];
}

-(void) persistentStoreCoordinatorStoresDidChange:(NSNotification*) notification
{
    NSDictionary* userInfo = notification.userInfo;
    [[NSOperationQueue mainQueue] addOperationWithBlock:^{
        [[NSNotificationCenter defaultCenter] postNotificationName:BSCoreDataControllerStoresDidChangeNotification object:self userInfo:userInfo];
    }];
}

(You can get the source code of BSCoreDataController from Github).

Minimizing duplicate seed data

The data seeding logic makes use of NSUserDefaults to determine whether setup had been done on the current device. In addition it keeps track of the iCloud accounts where the data was seeded – so that if the user changed iCloud accounts, this new account can get seed data as well.

-(void) setupSpeechTypes
{
    NSManagedObjectContext* managedObjectContext = self.managedObjectContext;
    if (!managedObjectContext) {
        return;
    }
    
    // first check user defaults and see whether we've done preloading yet.
    id < NSObject, NSCopying, NSCoding > ubiquityIdentityToken = [[NSFileManager defaultManager] ubiquityIdentityToken];
    if (!ubiquityIdentityToken) {
        // don't use [NSNull null] as it can't be stored inside a property list.
        ubiquityIdentityToken = @NO;
    }
    
    NSUserDefaults* userDefaults = [NSUserDefaults standardUserDefaults];
    NSArray* setupCompletion = [userDefaults objectForKey:BSSpeechConfigSetupCompletedForAccountKey];
    if ([setupCompletion containsObject:ubiquityIdentityToken]) {
        return; // setup already done for this account, exit here.
    }
    
    // ... data seeding logic here ... 
    
    // mark setup completion
    NSMutableArray* setupCompletionMark = [NSMutableArray arrayWithArray:setupCompletion];
    [setupCompletionMark addObject:ubiquityIdentityToken];
    [userDefaults setObject:setupCompletionMark forKey:BSSpeechConfigSetupCompletedForAccountKey];
}

Any Feedback?

So what do you think? What corner cases that I still need to cover? Is there a better way to do this? Please let me know.

PS: I used the awesome mogenerator for my Core Data classes. In case you wonder some convenience methods like entityInManagedObjectContext – they came from mogenerator.



Avoid App Review rules by distributing outside the Mac App Store!


Get my FREE cheat sheets to help you distribute real macOS applications directly to power users.

* indicates required

When you subscribe you’ll also get programming tips, business advices, and career rants from the trenches about twice a month. I respect your e-mail privacy.

Avoid Delays and Rejections when Submitting Your App to The Store!


Follow my FREE cheat sheets to design, develop, or even amend your app to deserve its virtual shelf space in the App Store.

* indicates required

When you subscribe you’ll also get programming tips, business advices, and career rants from the trenches about twice a month. I respect your e-mail privacy.

0 thoughts on “Seeding iCloud Core Data Store

Leave a Reply