Prediction of android ransomware with deep learning model using hybrid cryptography


A thorough explanation of the suggested design is described in this section. Primarily, the input APK files/ data are preprocessed to extract features. The selection of optimal features is supported out by means of the Squirrel search optimization (SSO) process. After that, the DL-based model-Adaptive deep saliency AlexNet classifier is presented to detect and classify data as malicious or normal ones. The detected data which are not malicious are stored in a cloud server. For secured storage of data in the cloud, the hybrid cryptographic model (Hybrid Homomorphic ECC & Blowfish) is employed which includes key computation and key generation process. The cryptographic scheme includes encryption and decryption of data after which the app response is found to attain a decrypted result upon user request. The illustration of the whole manuscript workingflow is shown in Fig. 1.

Fig. 1

Schematic flow of suggested design.

Experimental setup

The evaluation setup of the proposed system is expressed here by taking the online available dataset as https://github.com/harrypro02/Android-Malware-Permission-Based-Dataset’, and maldroid-2020, which may consist of different permissions for Android malware identification on mobile devices with different parameters such as storage, image, opcode, system call, and all permissions inside the mobile device. Then the preprocessing is handled with the model to train the DL model from the available dataset. To get the desired result, the dataset may consist of 15,000 entries in 5 rows and 1204 columns with malware data, and the normal dataset is preprocessed into the training model. After training, feature extraction is done with the SSO optimization algorithm to improve model performance with an efficient learning rate of 0.01 and L2 regularization to overcome the loss after feature extraction. Cryptographic key management is processed with homomorphic ECC and Blowfish encryption to ensure security is maintained throughout the process of decrypting the affected data processed by the ransomware. After encryption, model performance is analyzed with 50 epochs of training and 80/20 testing using adaptive deep saliency. AlexNet is configured with a dense layer and an Adam optimizer with a Relu activation function to capture Android malware with high accuracy compared with other traditional deep learning models available. Finally, this model ensures the combined deep learning and cryptographic methods work very well in detecting Android malware with high accuracy, and the scientific design of the proposed model is expressed in the below sections.

Input data cleaning

At first, the input data (application to be installed) is taken and preprocessed to remove redundant and unnecessary data. The preprocessing in this model aims to evaluate the capability to protect against the embedded ransomware code in Android apps. To attain this, a special key for preprocessing the bytecodes from Android apps is used and exploited as a structure. A hybrid cryptography model was employed to determine significant features for the finding of Android malware. Before employing the detection algorithm, preprocessing is carried out for a dataset index format. This approach takes care of converting the process of dex files to a suitable APK format. This in turn includes dex-file compiling as a setup that adapts to APK. For maintaining the designing compatibility of mobile devices, modules from JVM of Omni Rom (OR) to this transformation are employed. After the completion of the conversion process, the model analyses the text segment of every APK file to extract opcode instructions. This model includes model creation for detecting ransomware and for securing cloud server data which follows the entire crypto-code. The APK image was taken for each application and thus extracts the respective bytecode from the segment of .apk. the model calculates the data occurrence and thereby sends them to the process of feature extraction.

Feature extraction & SSO (squirrel search optimization) optimization-based selection

The classification of malware depending on grey-scale image extraction is a new approach. This has proved to be an effective tool for static analysis. It is the image that is expressed in grey color. According to the logarithmic relationship, the brightness from white to black color is thus divided into 256 grades. Various physical data from graphs could cause a respective difference in greyscale, and textures which confirms the reflection in the visual field. For exploiting the malware texture difference, an interactive disassembler was employed first (IDA) for attaining binary files into smaller units each one of them comprises of eight bits and is thus converted to unsigned integer format in a range 0–255. In a grey-scale image, 0 and 255 signify white and black correspondingly. At last, the file transformed is thus mapped to a matrix termed ‘grey scale matrix’. The matrix width is typically initialized to 2n. In this model, n is equal to 8. Moreover, the matrix is adopted to feature expression. So as to adopt this, the grey-scale matrix is thus mapped as a one-dimensional vector termed ‘grey scale vector’.

The features extracted are then subjected to an optimal selection of features so as to select optimal ones. The algorithm of SSO updates the position of the individuals as per the present season, the kind of individuals, and the predator’s appearance.

Initialization of population: Assume \(\mathcal{N}\) as the number of individuals, and \({SS}_{U}\) and \({SS}_{L}\) are the bounds of exploration space. As per the formula, the individuals are produced randomly in Eq. (1):

$$SS_{I} = SS_{L} + {\mathcal{R}}\left( {1,D} \right) \times \left( {SS_{U} – SS_{L} } \right)$$

(1)

\({SS}_{I}\) signifies ith specific \(\left(i=1,\dots ,\mathcal{N}\right)\), \(\mathcal{R}\) signifies the random number among 0 to 1 & D denotes the problem dimension.

Population classification

On taking the minimization issue into consideration, SSO needs one squirrel at every tree, whereas assuming the total number of squirrels are \(\mathcal{N}\). The population’s fitness function is thereby ranked in an ascending order. The squirrels are segregated into 3 kinds: Squirrels located at hickory trees \({S}_{H}\), squirrels at acorn trees \({S}_{A}\) and squirrels at normal trees \({S}_{N}\). To find the finest food cause, the terminus of \({S}_{A}\) is \({S}_{H}\) and terminus of \({S}_{N}\) is determined randomly as whichever \({S}_{H}\) or \({S}_{A}\).

Position updation

The squirrel’s position is thus updated in Eqs. (2 and 3).

$$\left\{ {\begin{array}{*{20}l} {SS_{I}^{t + 1} = SS_{I}^{t} + {\mathcal{G}} \times {\mathcal{C}} \times \left( {SS_{H}^{t} – SS_{I}^{t} } \right)} \hfill & {if\;{\mathcal{R}} > {\mathcal{P}}_{AP} } \hfill \\ {random\;location} \hfill & {Otherwise} \hfill \\ \end{array} } \right.$$

(2)

$$\left\{ {\begin{array}{*{20}l} {SS_{I}^{t + 1} = SS_{I}^{t} + {\mathcal{G}} \times {\mathcal{C}} \times \left( {S_{AI}^{t} – SS_{I}^{t} } \right) } \hfill & {if\;{\mathcal{R}} > {\mathcal{P}}_{AP} } \hfill \\ {random\;location} \hfill & {Otherwise} \hfill \\ \end{array} } \right.$$

(3)

\({\mathcal{R}}\) designates a random number & ‘t’ signifies the current iteration. \({\mathcal{P}}_{AP}\) denotes the probability of hunter arrival whose rate is 0.1. If \(\mathcal{R}>{\mathcal{P}}_{AP}\), then there will be predator absence & the squirrel slides into the forest for food. \(\mathcal{R}\le {\mathcal{P}}_{AP}\), the hunters might appear & squirrels have to decrease the activities of food forage since they are at risk. At that time, the squirrel’s positions are randomly relocated. \({\mathcal{C}}\) specifies the constant of value 1.9 & \(\mathcal{G}\) signifies gliding distance. \({S}_{AI}^{t}\) denotes randomly selected individual squirrels from \({S}_{A}.\) Gliding distance is considered as in Eq. (4)

$${\mathcal{G}} = \frac{{{\mathcal{G}}_{H} }}{{\tan \left( \theta \right) \times {\mathcal{S}}}}$$

(4)

\({\mathcal{G}}_{H}\) denotes the persistent whose value is 8 & \(\mathcal{S}\) is the relentless of value 18. \(tan\left(\theta \right)\) signifies the sailing angle. Once the number of iterations exceeds the extreme amount of iterations, the individual’s movement is stopped. Or else, the above steps get repeated.

Adaptive deep saliency AlexNet classifier

Once the selection of features is made, the mechanism of classification is employed so as to recognize the attacks. In this effort, the organization approach is the final stage of the detection mechanism. The detection should be made before the security mechanism. For the classification process, adaptive deep saliency AlexNet classifier is employed to classify the data as malignancy or benignity labels. The dataset is subdivided into 2 stages for estimating the regions to test and train. This phase covers dataset vector training with its respective classes, whereas the output identifies whether the input image is mild or fatal. This classifier model is trained and tested with the kernel function of RBF to attain a better outcome.

The suggested Adaptive deep saliency AlexNet classifier model detects whether the data is malicious or not. The data in the words before step t of CNN architecture is too employed as the input at the time of word processing of step t. The early cell data are gathered from cells and words are thus given as inputs. The little references sense the repeated image over one cell. The cell sequence of architecture is another reference. The amount of text presented in each example of data does not turn out to be a specific value of natural language processing issues. For executing each text, the dimensions of the arrangements were reduced to value. Once the value of the arrangement is less than the desired value, the sequence is thus filled as a value. Once the sequence size exceeds the mentioned value, the remaining are rejected.

The AlexNet CNN model comprises 5 layers of convolution, two fully connected layers that are connected completely, and one recurrent layer. The layers of CNN were employed for learning middle-level patterns of visual similar to the first 5 popular layers of AlexNet seven layer. The layer of RNN is employed for learning the dependency of space among visual patterns of the middle layer. In both final, layers, 2 fully connected RNN outputs were gathered and the representation of a global image was learned. The classification of the SoftMax layer should be applied subsequently to N-way (N signifies the class number).

Algorithm 1

Adaptive deep saliency AlexNet classifier.

Once the data is classified as attack or normal, then the normal data is stored in a cloud server. From the cloud server, the data should be encrypted with a key so as to enable secured means of cloud storage. For the secured storing in the cloud, the computation of key or key generation is carried out followed by a hybrid cryptographic approach to enable encryption and decryption process. This is explained in subsequent sections.

Computation of key using K-Centers Diffie-Hellman (KC-DH)

For the secured means of storage in cloud, the data should be encrypted and protected with key. For key computation, the approach of K-centers Diffe-Hellman (KC-DH) is employed at which the generation of key is carried to share private key with which they could change data over insecure channel. However, the private key is not unique which generates each and every data sharing transaction at private key must be random. The algorithm for this key computation approach is shown below:

Algorithm 2

KC-DH protocol for key computation.

This is the discrete logarithm issue, which is infeasible computationally for larger p. The computation of discrete number logarithm modulo p takes a similar amount roughly the same time of amount since factoring the two prime products as similar as p, which is what the RSA cryptosystem security lies on. Therefore, this protocol ECC-DH is secured roughly as RSA.

Hybrid cryptography using Homomorphic ECC & Blowfish approach

The hybrid homomorphic ECC and blowfish-dependent cryptographic scheme was suggested in this model which the multilevel encryption to exchange data between server and client in a model of public SaaS. It is primordial to preserve confidentiality before outsourcing or sending information in both directions from the client to the cloud & vice versa. Consequently, unauthorized access by non-allowed users might be secured to prohibit security constraint threats that are coming from intruders. This hybrid model is offered in which the cloud server data uploaded is therefore encrypted using a blowfish strategy to enhance the aspect of security thus preserving data privacy. Yet, for high protection, keys used in the encryption process are handled therefore and encrypted by the ECC approach. This hybrid model not only guarantees integrity & confidentiality but also offers authenticity. The suggested model utilizes two kinds of cryptography approaches which are a symmetric approach (blowfish) and an asymmetric algorithm (ECC). Therefore, this model of hybrid model integrates two approaches to benefit the encryption process. The blowfish approach or symmetric model is thus employed to encrypt data that is kept in the cloud. Thus, the decryption process is a reverse one carried out in data outsourcing. The asymmetric model is ECC & thus employed in the management & encryption of encrypting keys. The integrity of homomorphic ECC and Blowfish scheme processes enhances security in mobile environments where ECC processes have strong security with small key sizes compared to traditional models such as RSA. The 128-bit key of ECC offers the same security level as the 1024-bit key of RSA. Similarly, Blowfish is familiar for its speed and efficiency in a small, less resource-constrained environment like Android with 64-bit blocks for data encryption. Combining these two schemes improves confidentiality and is valuable for mobile environments without exposing all data inside the service provider. With this option, hybrid cryptography is more efficient than using AES and other traditional methods. The proposed homomorphic ECC and Blowfish scheme has the ability to perform computation by improving security enhancement, where the information inside the mobile data remains protected even during processing compared with traditional security schemes.

The approach aims to protect data that are exchanged between server and client in SaaS. Hence, the process of encryption is thereby performed at data which is to be updated by the client beforehand this is transferred to a server. The reverse process is employed on a client before this is sent to the server. The reversing process is employed on downloading data, therefore client decrypts data downloaded from server which are from server that could be able to employed. Hence, the functioning of the system is thus mentioned below.

Data uploading

Data is encrypted from plain to cipher text before uploading by means of the blowfish approach. After that, encryption keys are thus encrypted by the ECC algorithm. Finally, both encrypted files & generated secret keys were sent to a cloud server in the form of cipher text.

Data downloading

The reverse process was carried out on downloading data. At first, an encryption key is thus decrypted using the ECC approach. Then, the generated key is employed to decrypt data using the blowfish approach. Thereby plain text is effectively recovered. In this way, unauthorized users could not employ files as it is in a secure manner, hence they are not competent to access them without using decryption. The multilayer encryption uses blowfish as the initial layer & ECC in the next layer which is blowfish on input text after which the encryption outcome attained is delivered to the next layer at which the encrypted blowfish keys are encrypted through ECC. The final encryption output is achieved. For the understanding purpose, a few variables &v some functions list is given by \(Fb, E(b,k), P(F), Ek,\) & \(Pk\) which are employed in the suggested approach as given below:

Alone the encryption algorithm offered as a decryption approach is nothing but the reverse process of this is as follows:

Algorithm 3

Hybrid Homomorphic ECC & Blowfish Approach.

In this, \(D\) is the input data file, the encrypted file is E, and N signifies the number of blocks in \(D\), \(E(b,k)\) signifies the encryption function that encodes \(text blocks(b)\) laterally with \(k key\) using projected \(IABE-PPKGC\) system. The P(F) function might allow (Fe) encrypted files to be sent them in a cloud server. The \(Ek\) function therefore encodes blowfish key with the use of \(ECC\) system. \(Pk\) signifies the function that permits encrypted key \((K1)\) which are produced by means of \(Ek\) in a cloud server.

The mathematical equation of the proposed Hybrid Cryptography is mentioned below, Elliptic curve cryptography has key generation, key exchange, and symmetric key derivation as follows.

(i)

Private key-PK, Public Key-PuK, where private key select integer in the range of (1, n-1) where n is the order in base point ECP-Elliptic Curve Point, then PuK is computed in Eq. (5)

(ii)

Now key exchange is processed with a shared secret key as SK, then SK is computed in Eq. (6)

$$SK=P{K}_{1}Pu{K}_{2}=P{K}_{1}\left(P{K}_{2}ECP\right)=P{K}_{2}\left(P{K}_{1}ECP\right)=P{K}_{2}Pu{K}_{1}$$

(6)

(iii)

Now the symmetric key derivation is derived using secret key SK and Symmetric Key as SyK using a Key creation Function as KCF expressed in Eq. (7)

(iv)

Blowfish has divided into two parts to make smaller keys for both encryption and decryption to solve computational overhead where P is for Plaintext and C for Ciphertext. The encryption and decryption process started with block size as b and i for data that is too processed is expressed in Eq. (8)

$${C}_{i}=Blowfis{h}_{SyK}({P}_{i})$$

(8)

$${P}_{i}=Blowfis{h}_{SyK}-1({C}_{i})$$

(9)

(v)

The final output is to be processed by getting the concatenation of all decrypted blocks to recover data from the Android ransomware encrypted data as output expressed as P Plaintext in Eq. (10)

$$P = P_{1} \left| {\left| {P_{2} } \right|} \right|..||P_{n}$$

(10)

The main focus of this research is to enhance the deep learning model with the SSO algorithm to optimize the significant pattern of the Android malware, whether it is vulnerable or normal data, with improved classification accuracy and powerful feature extraction using saliency. AlexNet for better generalization over traditional deep learning models Finally, hybrid cryptography uses high security on Android devices by using smaller key sizes and blowfish integration, making it a fast cryptographic model to achieve integrity and minimize computational overhead in a cloud environment where all information is stored34.


https://www.nature.com/articles/s41598-024-70544-x