AWS or Amazon Web Services are acting as significant virtual storage, compute, and network platform. The data powerhouse, AWS S3 is gaining traction among developers for simplifying web-scale computing. On the other hand, Athena is complementary service that streamlines data analysis in S3. Both these services are integral parts of the machine learning tech stack. We, at Oodles, as a provider of AI Development Services, discuss how to integrate and query AWS S3 data using Athena and Quicksight Visualization.
Amazon Simple Storage Service is storage for the net. AWS is designed to make web-scale cloud computing easier for developers.
Amazon S3 features a simple web services interface that you simply can use to store and retrieve any amount of knowledge, at any time, from anywhere on the online. It provides developer entrance to an equivalent highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon handles to run its global network of web sites.
The service proposes to optimize machine learning development for data-driven solutions.
Important: The Data should be stored in a bucket inside a folder. The single record of data should be stored in a single line and the next record start with the next line. The JSON fields end with a comma(,) then the next record should not contain any space.
Bucket Example: ProjectNameData -> folderName -> .json files or file
While we create a database and table in Athena, we need to specify the schema and the location of the data (S3 bucket location), making the data in the table ready for real-time querying.
Example:
Database Name: testDatabase
Table Name: testTable
Location: S3://ProjectNameData/folderName/ // S3 path S3://bucket/folder/
Row format: 'org.openx.data.jsonserde.JsonSerDe' // for SERDE to read JSON data it should be valid single object type (Other formats will not work)
Query: create database testDatabase;
Query: CREATE EXTERNAL TABLE IF NOT EXISTS `testDatabase`.testTable (
`id` int,
`dob` timestamp,
`name` string,
`address` string,
`email` string,
`company` STRUCT < `userId`: string, // This is for Nested data
`userRole`: array<string> >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 'S3://ProjectNameData/folderName/'
Query: select * from ‘testdatabase'.testtable;
At Oodles, we are an experiential team of AI software architects, developers, and data analysts. Our tech stack for machine learning is packed with robust tools and technologies like AWS, Azure, Google Cloud, and IBM Watson. We build dynamic AI solutions and applications that automate critical business operations effectively.
Join hands with our AI Development team to explore more about our AI capabilities.