South African Master Deeds data leaked via MySQL data dump

Troy Hunt’s haveibeenpwned.com has published information about a leak of over 60m South African records which seem to originate from the Master Deeds office. The database file was residing on a server hosted with a South African hosting provider and the access to the file was public and in clear text – i.e. accessible to everyone. I personally do not believe that the data originated from the Master Deeds office alone and the data was possibly sourced from a number of providers.

While the local media is starting to publish sensationalist stories about how the “hack” occurred, most of the information published is inaccurate, makes assumptions, blames the wrong people and massively overlooks some key aspects:

It was not a hack

The South African ECTA (Electronic Communications and Transactions Act, 2002) covers in great detail Unauthorised access to, interception of or interference with data and to anyone having accessed a database dump consisting of almost every person in South Africa it would be obvious that this access is illegal and unauthorised, despite the information being made public.

For the technically not so inclined, the database file was residing on a server and could be downloaded via a browser via an URL such as http://IP-NUMBER:443 and the server was hosted in South Africa:

It does not require “hacking” or elite security skills to find this type of information as in most cases search engines such as Google will index those backup files. Right now, Google has over 1,100 SQL dumps available on South African domains:

The files themselves resided on the root of the webserver as tweeted as “masterdeeds.sql” and a compressed version “masterdeeds.sql.gz”:

It’s unclear how many people had access to the above, but the files have a date of 2015-04-08. The compressed version was either copied 27 minutes later or it took 27 minutes to compress the larger file on the same day. The purple links indicate that the person accessing the server has visited those sub-directories.

It is not a “massive” database for email spam

Out of the 60 million records in the database dump, only 2.2 million records contained email addresses and the file also contained about 9m deceased records. Based on checks via Troy Hunt’s website, I have established that my records are in the database (at least one email address last used around 2011) and I doubt that the email data is current/fresh enough to engage in email campaigns (there are other questionable services available in South Africa to acquire more up-to-date email databases).

Was the leak disclosed responsibly?

Based on Troy Hunt’s website, the leak was submitted by a supporter of the project to Troy already in March 2017 and the file itself has been on the server for an extended period of time:

Based on the little information available it does not appear that the person submitting the leak made any real effort for responsible disclosure and mitigation: The file was found on/before March 2017. Why did the “security expert” not inform the hosting provider (Hetzner) about the severity of the situation when the file was found/submitted?

Why was the hosting provider and the companies only contacted late October 2017 to remove the file in question?

At this point in time it is not possible for a consumer to check his records in the leak via a South African ID-number. It will also be quite difficult to inform users about the leak as data is most likely outdated.

Public disclosure and mitigation by company accountable

While the media is still in a frenzy to post sensationalist stories, I do expect that the companies responsible for the leak will contact every person affected via a public disclosure program where users can lookup their details via email or ID-number (or a combination of other details) and will then provide a facility for all users to protect them from future identity theft and misuse of sensitive information.

Since ID numbers (comparable to a social security number) are leaked, people in possession of the data can open bank accounts, apply for credit facilities or destroy someones credit history and go as far as harassing people listed in those files (since the data contains address information, birth date, estimated income etc).

Due to the severity of the leak it would be necessary for the Department of Communication to intervene as the data-base dump will also contain details of otherwise classified people (government staff, politicians, state-security personnel) and ensure that all people having accessed the data will be investigated and that the company/persons responsible for the leak will be held accountable.

Dracore (a data-services company providing information to credit-bureaus and property companies) was accused by reporter Tefo Mohapi and has just issued the following statement: Is Dracore Data Sciences Responsible For South Africa’s Largest Ever Data Leak? and is now engaging in legal action against companies having accused Dracore being responsible.

I am certain that this “leak” will occupy the media and law-enforcement for a few weeks and I would be surprised if swift action against the people responsible is been taken. Right now it seems that the finger-pointing and blame-shifting has started without a single entity taking responsibility or providing assistance for the consumers affected.

What can you do if you are in the leak?

Since only a small amount of records contain email addresses, it will be very unlikely that the majority of people will be able to verify if they are part of the leak. My expectation is that every South African owning a property, car or ever has applied for a line of credit (via store cards, mobile phone contracts) would be in that file. It is still unclear of the actual source of the data, but to me it appears that it is an aggregation of multiple sources.

While opening a criminal case is a possibility based on violations of ECTA, it’s enforcement and prosecution is probably very slim (and costly). My suggestion is to pre-emptively inform your bank that you have been part of leak and that the bank should place a lock on facilities (opening of accounts, adding beneficiaries). It might be worthwhile to frequently check your credit history as your details could be used to apply for new line of credit or open bank accounts.

As far as I am aware, South Africa has no national service where South African ID lookups/usage can be blocked and there will be a significant chance for criminals abusing your details for various illegal activities: registering a car or company in your name, using your leaked details to change banking details or make purchases on credit in your name. The only protection you have is to be vigilant and not fall prey to any phishing campaigns and frequently monitor your bank statements.

IMPORTANT: Change all your passwords and possibly your email address. If your email is in the leak and is being used on online services, change the email to something else and while at it, also implement unique passwords per online service. This is probably the most proactive you can get in dealing with this issue as no-one else will assist you in case of identity theft.

Who owns the data and where did it come from – the NSSF?

I am basing my assumptions on screenshots and tweets from various people on the internet. The database schema uploaded by someone on PasteBin shows only personal information (such as ID number, income, gender, LSM, income information and address details). It is also unclear if those 60m records all contain ID numbers and if that data-set is unique (i.e. it could very well be possible that if the data really originates from the Deed’s office that a person with multiple homes could have multiple records in the file).

The theory that those records are leaked from SARS (South African Revenue Service) is unlikely as SARS only has about 20m registered tax-payers and even if you include registered companies you would not go beyond 23m records. Aggregating those records from banks is also unlikely and I am pretty certain that the Deeds office would not have records of all individuals.

Any responsible reporter or IT publication should remember the discussion that Home Affairs would resell citizen information under the umbrella of the National Social Security Fund (NSSF) and I leave with you with this as part of the ANC’s policy document:

The DHA is the custodian of identity and it is building a comprehensive and accurate database of valuable data that is one of the largest globally.

The sale of identity services and products are another large revenue stream, with potential partners including GPW, the CSIR and private sector companies.”

The Elephant in the room: Full disclosure vs coordinated disclosure

The ECTA is quite vague when saying:

a person who intentionally accesses or intercepts any data without authority or permission to do so, is guilty of an offence.

While the first person having found the leak might not be guilty of an offence, anyone who subsequently accessed the Master Deeds leak did this with intent and knew very well about the sensitivity of the data. Investigations will possibly show (provided that Hetzner has kept forensic access-logs) that a large number of people accessed the file in the past days before it was taken down. That access was certainly illegal as it was intentionally accessing sensitive information about private citizens. Being a “journalist” should not indemnify you from this, especially if overzealous reporting resulted in disclosing of the server’s IP address on a number of public forums.

The bigger argument/discussion should be how a company has acquired implied consent by the owners of the data. Till POPIA is in effect, there is no legal framework to govern this properly. The ECTA only covers this via Unsolicited goods, services or communications and enforcement is almost impossible.

In my opinion, the whole reporting on the leak went wrong in a number of ways:

  • The person who found the data reported it to Troy Hunt with the expectation that something would be done in March 2017. Some people seem to associate Troy Hunt as the “hacker” – this is not true. Troy is a well-known InfoSec expert.
  • The leak was reported in October 17/18 and when the first report hit online, the file was still available on their website.

What should have happened:

  • Any security researcher worth his salt, should have contacted the ISP and the company owning the domain/IP for immediate take-down when the file was first found
  • If the service-provider (Hetzner) does not assist, this can be escalated to ISPA to follow a formal TDN – https://ispa.org.za/tdn/
  • Any content indexed by Google/Bing/Yandex should have been removed via a removal request: https://support.google.com/legal/troubleshooter/1114905
  • Only publish the information about the leak once the leak has been removed from the server and forensic information has been secured (access logs, login history, MySQL access etc)

The IT publications running the story did so overzealous to be first with a “hot story” and ignored their mandate of responsible reporting and protecting the rights of millions of South Africans.

There is no excuse that the information was on the internet for years. While some people can plead ignorance about IT security, any person would understand the gravity of “There is a file containing 60m records of South African citizens publicly available”.

If such information is found, I understand in the South African context that asking the police for help is pointless. As such I would rely on journalists and IT publications such as MyBroadband, ITWeb and others to do the responsible thing and follow proper process as part of a coordinated InfoSec disclosure process. Rushing out the story for clickbait has achieved the opposite with substantially more people accessing the files illegally had the story been held back for a few days until the data is removed and forensic logs have been secured.

Don’t get me wrong: I am in full support of Full Disclosure, but in many instances Coordinated Disclosure is necessary to limit the access/distribution of vulnerabilities and leaks – this is done in many instances to allow vendors to patch their systems before making information public.