Python and Oracle Cloud: loading data

Easy!Well, obviously there is the documentation, but, if you’re using PyPi, installation is only one line of code:pip install ociIt works also with Anaconda.

If you want to be one of the first to try the new features added, simply check out the OCI Python Github repository.

ConfigurationThen, you need to do some configuration tasks, needed to ensure secure communication between your working environment and your OCI Cloud.

I’ll spend some time to clarify here since I don’t want you to lose precious time on this part.

First, you need to create an API signing key-pair, in PEM format.

You must keep the private safe on your environment (the computer running the Python program) and store the public key in your Cloud profile.

When you store it, you can read from Cloud UI the fingerprint.

Some more details can be found here:Required Keys and OCIDsLearn about keys and OCIDs required when using an Oracle client or a client you built yourself.

docs.

cloud.

oracle.

comThen, with every type of clients (ObjectStorageClient, ComputeClient, ….

) you need to supply a set of (key, value) pair, packed as a Dictionary, in order to specify:your OCI userida key (API signing key) used to secure communicationthe fingerprint of this keyyour Cloud tenant identifierOCI RegionThis is an example of what you need to prepare (anonymized):config = {“user”: “ocid1.

user.

oc1.

XXXXX”,“key_file”: “/Users/lsaetta/Progetti/xxxx/oci_api_key.

pem”,“fingerprint”: “YYYYY”,“tenancy”: “ocid1.

tenancy.

oc1.

ZZZZZZ”,“region”: “eu-frankfurt-1”}For user and your Cloud tenant, you need to use not the name, but the identifier OCID, that you can copy from the OCI Cloud Console.

Loading files in the CloudAt this point, you’re ready to load a set of files as Objects in the ObjectStore.

This is the code to load a single file, I’ll comment sooner:As you can see, you need:to create a client, in this case an ObjectStorageClient, and pass to the constructor the configuration Dictionarycall the method put_object with bucket_name, file_name, file_path as parametersThat’s all.

In the Github repository, you’ll find an enhanced version, handling a set of files.

Store credentials inside the DBNow, it is time to copy data from ObjectStore to ADWC tables.

First of all: we need to create a set of credentials and store them inside the DB, in order to enable PL/SQL code to access the ObjectStorage Bucket.

Go to the OCI UI and create, inside your profile (top right icon and UserSetting), an AuthToken:Authentication TokenCopy from the UI the Authentication Token.

Remember to do that right after created, otherwise, you will not able to re-read it.

Second, store the Cloud Credentials inside the DB using this code; This step needs to be done only once:import cx_Oracleconnection = cx_Oracle.

connect(USER, PWD, 'adwc5_medium')# need to create a cursorcur = connection.

cursor()statement = """BEGIN DBMS_CLOUD.

CREATE_CREDENTIAL( credential_name => 'DEF_CRED_NAME', username => 'adwc_user@oracle.

com', password => '<AUTH Token>' );END;"""cur.

execute(statement)Take care: if your user is a federated one, the name is the fully qualified name.

It has a prefix like:oracleidentitycloudservice/and you must specify, in the CREATE_CREDENTIAL call, the fully qualified name.

Loading data in Oracle ADWCSince we have now the flat files in the ObjectStore, we can employ the PL/SQL package DBMS_CLOUD to load the data inside ADWC tables.

Ok, it is PL/SQL code, but not very complicated, and you can easily cut&paste.

We’ll wrap this code inside Python code.

We’re imaging the simpler scenario: one Oracle table for flat file and the format of the file is the same as the table (one-to-one correspondence between file and table fields)We have to create the table inside the DB.

For example:CREATE TABLE CHANNELS (channel_id char(1), channel_desc varchar2(20), channel_class varchar2(20) );/Then we can load the data contained in one file in this table, with this code:import cx_Oracleconnection = cx_Oracle.

connect(USER, PWD, 'adwc5_medium')# need to create a cursorcur = connection.

cursor()statement = """BEGIN DBMS_CLOUD.

COPY_DATA( table_name =>'CHANNELS1', credential_name =>'DEF_CRED_NAME', file_uri_list =>'https://swiftobjectstorage.

eu-frankfurt-1.

oraclecloud.

com/v1/tenant-name/BUCKET_NAME/channels1.

txt', format => json_object('delimiter' value ',') );END;"""cur.

execute(statement)One detail: to reference the Object you need to use the SWIFT-type URL.

What you need to change in the above example is:the Region Name (here it is eu-frankurt-1)the name of your tenantthe name of the bucket in Object Storethe name of the Object (here: channels1.

txt)As we can see, the DBMS_CLOUD package has all the required API to:store credentials inside the DBCOPY from ObjectStore to DBIt provides also some other nice features.

For example, you can use Apache Parquet format for the files, instead of CSV.

Want to check?If you want to check that data has really been loaded, you can use this snippet of code:import pandas as pdimport cx_Oracleconnection = cx_Oracle.

connect(USER, PWD, 'adwc5_medium')channel1 = pd.

read_sql("select * from CHANNELS1", con=connection)# if it is a Notebookchannel1.

head()External tableAnother option for Data Loading is to create in the DB an External Table, referring to the file.

A soon as you have the external table working, you can use DML instructions to load data inside the DB.

For example, you can use CTAS: CREATE TABLE AS SELECT.

The procedure to create the external table is:DBMS_CLOUD.

CREATE_EXTERNAL_TABLEmore details here.

ConclusionAs a Data Scientist, you need to work with data.

Sometimes you’ll need to load by yourself.

Sometimes, you’ll have a DBA to load for you.

In this article, I have shown how to use OCI Python SDK to:load files in the ObjectStorecopy data from ObjectStore to ADWC tablesIn addition, I have given a close look at the power of OCI Python SDK.

With it, you can create your highly customized set of tools to automate your daily work.

You can even create your ADWC instance or scale it up&down (adding cores and storage space).

In a future article, I’ll return to Machine Learning.

It is almost time to explore OML4Py.

One more thingIf you want to find more examples using OCI Python SDK, you can have a look here.. More details

Leave a Reply