Razormind provides several layers of security for all data. These can be understood within four areas:

The majority of big data engines rely at least in part on NoSQL platforms such as riak or Hadoop. These come with significant advantages however they also present key security issues:

1. Transactional Integrity

– One of the most visible drawbacks of NoSQL is its soft approach towards ensuring transactional integrity. Introducing complex integrity constraints into its architecture will fail NoSQL’s primary objective of attaining better performance and scalability. Techniques like Architectural Trade-off Analysis Method (ATAM) specifically deal with the trade-offs in quality requirements in architectural decision (for example, performance vs. security). This analytical method can be utilized to evaluate the level of integrity constraints that may be infused into a core architectural kernel without significantly affecting performance.

2. Lax Authentication Mechanisms

– Across the board, NoSQL uses weak authentication techniques and weak password storage mechanisms. This exposes NoSQL to replay attacks and password brute force attacks, resulting in information leakage. NoSQL uses HTTP Basic- or Digest-based authentication, which are prone to replay or man-in-the-middle attack. REST, which is another preferred communication protocol, is also based on HTTP and is prone to cross-site scripting, cross-site request forgery, injection attacks, etc. Above all, NoSQL does not support integrating third-party pluggable modules to enforce authentication. By manipulating the RESTful connection definition, it is possible to get access to the handles and configuration parameters of the underlying database, thereby gaining access to the file system. Although some of the existing NoSQL databases offer authentication at the local node level, they fail to enforce authentication across all the cluster nodes.

3. Inefficient Authorization Mechanisms

– Authorization techniques differ from one NoSQL solution to another. Most of the popular solutions apply authorization at higher layers rather than enforcing authorization at lower layers. More specifically, authorization is enforced on a per-database level rather than at the collection level. There is no role-based access control (RBAC) mechanism built into the architecture because defining user roles and security groups with an RBAC mechanism is impossible.

4. Susceptibility to Injection Attacks

– Easy to employ injection techniques allow backdoor access to the file system for malicious activities. Since NoSQL architecture employs lightweight protocols and mechanisms that are loosely coupled, it is susceptible to various injection attacks like JSON injection, array injection, view injection, REST injection, GQL injection, schema injection, etc. For example, an attacker can utilize schema injection to inject thousands of columns onto the database with data of the attacker’s choice. The impact of such an attack can range from a database with corrupted data to a DoS attack resulting in total unavailability of the database.

5. Lack of Consistency

– The inability to simultaneously enforce all three elements of the CAP theorem (consistency, availability, and partition tolerance) while in distributed mode undermines the

trustworthiness of the churned results. As a result, users are not guaranteed consistent results at any given time, as each participating node may not be entirely synchronized with the node holding the latest image. Current hashing algorithms entrusted to replicate data across the cluster nodes crumple in the event of a single node failure, resulting in load imbalance among the cluster nodes.

6. Insider Attacks

- Lenient security mechanisms can be leveraged to achieve insider attacks. These attacks could remain unnoticed because of poor logging and log analysis methods, along with other rudimentary security mechanisms. As critical data is stowed away under a thin security layer, it is difficult to ensure that the data owners maintain control.


Big Data Analysis Best Practice


Data integrity needs to be enforced through an application or middleware layer. Passwords should never be left in the clear while at rest and in transit, but insteadshould be encrypted or hashed using secure hashing algorithms.


Similarly, data stored in the database should never be left in the clear. Considering the already weak authentication and authorization techniques employed, it is vital to keep the data encrypted while at rest despite the associated performance impacts. Hardware appliance-based encryption/decryption and bulk file-based encryption are faster and would alleviate some concern about the performance impacts of encryption. Of course, hardware-based encryption is not without its own criticism as it often leads to a vendor lock-in, low-strength key used in encryption/decryption that can be exploited by attackers. As a result, malicious users who gain access to the file system could directly extract sensitive data from the file system. In order to maintain confidentiality while in transit, it is good practice to use SSL/TLS to establish connections between the client and the server and also for communication across participating cluster nodes. Adopting such mechanisms to exchange mutually verifiable keys and establish trust would ensure data confidentiality while data is in transit. The NoSQL architecture should support pluggable authentication modules with the capacity to enforce security at all levels as the situation demands.


Communication across clusters should also be better controlled so that each node can validate the trust level of other participating nodes before establishing a trusted communication channel. The utilization of intelligent hashing algorithms can ensure that the data is replicated consistently across nodes, even during node failures. All NoSQL products/solutions recommend that they be run on a trusted environment, which ensures that only trusted machines can access the database ports.


Appropriate logging mechanisms, on-the-fly log analysis, and aggregation and application of correlation to log analysis could expose possible attacks. Applying fuzzing methods (providing invalid, unexpected or random inputs) can be an ideal method to expose possible vulnerabilities in NoSQL that engages HTTP to establish communication with the clients.


Appropriate data tagging techniques, with time stamps enforced through intelligent algorithms while piping data from its source, will defend against unauthorized modification of data. These techniques will also preserve the authenticity of the data stored.