Read the entire assignment before you start working on the assignment.A complete report is expected to get full marks.
In this assignment, students will experiment with database indexing and generate a report based on their findings. Each assignment must be unique to each student and group work are not allowed. Students will gain an insight into what happens when a relation grows in terms of the number of records and the role that the indexing plays on these relations. And the student will be also able to understand the concept of query optimization.
Students will be given (uploaded to Canvas) data files in CSV (comma separated value) format. One single relation which is a list of around 1.7 million records (database table) will be given as three parts (charitables_data_1.csv, charitables_data_2.csv, and charitables_data_3.csv).
Another relation which is metro cities in the USA given as a separate CSV file (metro_cities.csv).
Create two tables (charitable_organization and metro_city) with appropriate data types and lengths for each attribute. Select a primary key for each relation.(10 marks)
Load CSV files into two relations that you created in Task 1. Students must be very careful in this task as if you create a table with smaller field size for a given data in the CSV file, the complete data point may not be loaded into the table. (20 marks)
Note: It may take minutes or if not hours to load this dataset depending on your computer performance. If the data loading fails, try smaller chunks of each of the three files given below. Hopefully it will be loaded as a single file and can be done in three data loads. Metro city will be loaded quickly as it is a smaller relation. You need to research on how to load CSV into MySQL through the MySQL client of your choice (MySQL workbenchpreferred).
- Do not load all three files into the charitable_organization relation at once as you need to record performance for three different relation sizes for the Charitable_organization
Run the following queries and record the performance (time it takes to return the requested result). This time value can be obtained from the MySQL client that you use. Please refer to the video demonstration in Canvas on how to get this time from the MySQL workbench.
- Select co.name, co.city, co.state from charitable_organization as co where co.city=”Denton” and co.state=”TX”;
- Select state, count(*) as num_of_cities from charitable_organization as co group by state;
- select co.name,mc.metro_city from charitable_organization as co
join metro_city as mc on
co.city=mc.city and co.state = mc.state
You must run the above three SQL after loading the first data file (charitables_data_1.csv), after loading the first two data files (charitables_data_1.csv + charitables_data_2.csv), and after loading all three data files(charitables_data_1.csv + charitables_data_2.csv+ charitables_data_3.csv).Metro city table must be loaded first.
There will be 9 performance values for the task 3.(30 marks)
Add appropriate indexing to each of the two relations. Redo the Task 2 and record the performance. Note: You do not need to do it for partial data loading. Only do this for the entire dataset. There will be three performance values for each SQL given in task 2.(30 marks)
Explain in your own words, how and why the performance increased or decreased after creating the corresponding indices.(10 marks)
Note: If your computer is having technical issue, we (Instructors/graders/TAs) are not the contact point. Please try to find if is there any other options for you. There are CSE servers that you can log into and use. You can log in with your EUID. I believe there should be MySQL servers as well. We will notify you once we find the availability of such a service for MySQL.
You can use explain keyword to get a details execution plan of the query you need to run as follows. Refer to the video uploaded to Canvas.
Explain Select co.name, co.city, co.state from charitable_organization as co where co.city=”Denton” and co.state=”TX”;
Include the output of the above SQL statement in your report for each SQL you run.
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS