2023, AI for programming

> [!summary]+ Summary > This page explains a workshop designed to introduce strategies for communicating with LLMs. In addition to talking about prompting and verb use, we demonstrated strategies we called 'deeper', 'bigger', 'solutions', and 'explanation'. We also began to address whether LLMs should be seen as tools or partners. (See the bottom of this page for the results of this workshop.) # Code Smarter, Not Harder: Leveraging AI for Enhanced Programming **Delivery details:** <u>Date</u>: September 18, 2023 <u>Target audience</u>: University faculty, researchers, staff, students <u>Delivery format</u>: Hybrid <u>Duration</u>: 90 minutes ## About the workshop This workshop was a joint session planned and delivered with 2 faculty and 1 graduate student from the [GW Coders](https://gwcoders.github.io/studyGroup/) group at George Washington University. (I am not a coder, but try to learn and have played with a few languages.) This was at the start of the fall 2023 term when a few faculty and staff, but mostly students, were interested to see what could be done with LLMs. We were all simply trying to figure out how to best communicate with LLMs for any sort of project, not just coding. So we decided that the best thing would be to focus on communications strategies and to demonstrate what we found in discussion were similar use patterns — digging deeper on topics; expanding discussion for bigger ideas; identification of potential solutions to problems; and asking for explanations to recommendations. Ultimately, this was an exploratory session for all who attended. ### Description > In this workshop, we'll explore Generative Artificial Intelligence (GAI) tools, such as ChatGPT, Code Interpreter, Microsoft Copilot, StarCoder, and others. We'll cover practical ways to effectively problem-solve and enhance your coding process. During our time, you will gather an overview of how to use GAI in your coding workflow, as well as strategies for interacting with these AI systems to get the most out of their capabilities. We will discuss potential security challenges, ethical considerations, and the responsible use of AI tools. Having some experience with generative Artificial Intelligence tools is useful but not necessary. ### Learning objectives 1. Recognize how to effectively leverage Generative Artificial Intelligence (GAI) tools to enhance your coding workflow, efficiency, and productivity. 2. Identify limitations and potential security challenges associated with GAI tools, as well as the impact on your coding practices. 3. Distinguish prompting strategies and ethical considerations related to using GAI in coding. ### Workshop goal **The goal was for participants to have an opportunity to engage with faculty, staff, and students exploring the use of LLMs for work/projects and seek a starting point and/or answers to questions.** In alignment with the nature of coding communities, we created a space for open discussion of curiosities and problem solving. ### Verbs for training humans As part of the workshop, we spoke early on about *how to communicate* with LLMs. One of the resources pulled from was a list of verbs created by [descript](https://www.descript.com/blog/article/100-chatgpt-prompts-for-creators-speed-up-your-workflow-with-ai). ![[Verbs-for-prompts.jpg]] ## Workshop plan <div class="container"><iframe class="responsive-iframe-pdf-port" src="https://1drv.ms/b/c/13829E5D2EB238DE/IQSYXB7aqTe6T6xYLwQwK05EAbadG6X1Z8RkxnxbkdoWvag" width="100%" height="550" frameborder="0" scrolling="no"></iframe></div> *Note: No slides were built for this session. Generative Artificial Intelligence (genAI) was use interactively to demonstrate what could be done at the time.* ## Demonstration prompts You may expand the items below to see the example prompts used with ChatGPT and the responses that were received. These were all done with paid ChatGPT-4 accounts. The strategies run in sequence and the data used in the conversation was from NASA. (The data is linked in the resources section.) > [!example]- Strategy 1, Deeper > **Deeper – A partner to explore ideas** > **Question refinement** > Prompt-1: > I need some help digging deeper into a topic. Whenever I ask you a question, I need you to suggest to me a better version of the question to think about instead. > > Prompt-2: > What are the benefits of using ChatGPT to assist me with learning to be a more proficient programmer? > > ChatGPT: > A more nuanced question to consider could be: "How can ChatGPT specifically enhance various aspects of my programming skills, from understanding algorithms to debugging, and what are its limitations in the context of programming education?" > > Prompt-3: > Please do the same with this question. What are some different techniques to handle missing data in a machine learning project? > > ChatGPT: > A more refined question to explore might be: "What are the pros and cons of various techniques for handling missing data in machine learning projects, and how do these methods impact model performance and interpretability?" > > **Outline expansion** > Prompt-1: > I'd like you to act as an outline expander. Please generate a bullet point outline based on the input that I give you and then ask me which bullet point you should expand on. Each bullet point can have at most 3 sub bullets. For the bullet point I select, create a new outline. After each outline, ask me which bullet you should expand next. > > Prompt-2: > Writing effective ChatGPT prompts for improved programming. > > ChatGPT: > Here's the outline based on your input: > > 1. Introduction > - Importance of effective prompts > - Objectives of the guide > - Who should read this guide > > 2. Understanding ChatGPT > - How it works > - Limitations > - Use-cases in programming > > 3. Components of an Effective Prompt > - Clarity > - Specificity > - Contextual information > > 4. Techniques for Writing Prompts > - The "Ask-Tell" method > - Iterative refinement > - Leveraging token count > > 5. Examples and Case Studies > - Good vs Bad prompts > - Real-world applications > - Pitfalls to avoid > > 6. Testing and Validation > - Methods for testing > - Common pitfalls > - Feedback loops > > 7. Conclusion > - Summary of key points > - Next steps for implementation > - Additional resources > > Which bullet point would you like to see expanded next? > > … > > [SEE LINE 16 ABOVE – "A more refined question…"] > Prompt: Using this last question you gave me, please go back and be an outline expander. Same rules as above. Please give me an outline I can use to explore this question. > > ChatGPT: > Certainly! Here's an outline based on the refined question: > > 8. Introduction > - Importance of handling missing data > - Scope of the discussion > - Intended audience > > 9. Common Reasons for Missing Data > - Data collection errors > - Incomplete sources > - Data sparsity > > 10. Impact of Missing Data > - Model performance > - Bias and fairness > - Interpretability > > 11. Techniques for Handling Missing Data > - Imputation methods > - Data augmentation > - Utilizing specialized algorithms > > 12. Pros and Cons of Various Techniques > - Effectiveness in different scenarios > - Computational cost > - Risk of introducing bias > > 13. Case Studies > - Real-world examples where techniques succeeded > - Real-world examples where techniques failed > - Lessons learned > > 14. Evaluation Metrics > - How to measure success > - Tools for assessment > - Caveats in evaluation > > 15. Conclusion > - Summary of key findings > - Recommendations for practice > - Further areas for research > > Which bullet point would you like to see expanded next? > [!example]- Strategy 2, Bigger > **Bigger/Broader – Wide exploration** > **Flipped interaction** > Prompt-1: > I would like you to ask me questions to help me create a visualization of historical land and ocean temperatures with data from NASA. Ask me questions until you have enough information to identify the easiest approach to creating my visualization. Please ask me one question at a time. > > ChatGPT: > Absolutely, let's get started. What is your primary goal for creating this visualization of historical land and ocean temperatures? Are you looking to understand trends, compare regions, or something else? > > Prompt-2: > I want to communicate to my undergraduate students the general trend in the data. > > ChatGPT: > Great, that helps clarify the purpose. What type of data are you working with? Is it time-series data, geospatial data, or something else? > > Prompt-3: > It is time-series data. It starts at the year 1880 and ends on 2022. > > ChatGPT: > Excellent, time-series data will make it easier to show trends over time. What software or programming languages are you comfortable using for data visualization? Do you have a preference for tools like Excel, Tableau, or coding languages like Python or R? > > Prompt-4: > I prefer to use Excel because I have worked with it for many years. But I am open to learning Python. Generally, I'm looking for what will give me the most flexible output. > > ChatGPT: > Understood, both Excel and Python have their merits. Excel is simpler and more immediate, while Python, particularly with libraries like Matplotlib or Seaborn, offers more customization and flexibility. Do you already have the NASA data, and in what format is it available? Is it a CSV file, Excel spreadsheet, or something else? > > Prompt-5: > Yes, I have the data in a CSV file. Let's go with Python. > > ChatGPT: > Great, Python will offer you a lot of flexibility for creating detailed visualizations. One last question: Do you have any specific design elements you want to include, like annotations to point out significant years or events, or is a straightforward plot sufficient for your needs? > > Prompt-6: > No, I just need to be sure that red, green, or yellow colors are not used for accessibility purposes. > > ChatGPT: > Thank you for sharing that important detail about accessibility. Given all the information you've provided, the easiest approach for you would be to use Python with the Matplotlib library to plot your time-series data. You can customize the color palette to avoid red, green, and yellow. Your data is already in a CSV file, which makes it easy to import into Python for manipulation and visualization. > > Would you like to proceed with some code examples or further guidance on how to go about this? > > Prompt-7: > Yes, please give me some code examples I can start with. > > ChatGPT: > Certainly! Below is a basic Python code snippet that utilizes the Pandas library for data handling and Matplotlib for plotting. This code assumes that your NASA data in the CSV file has two columns: one for the year (`Year`) and one for the temperature (`Temperature`). Feel free to adjust the column names as they appear in your dataset. > > First, you'll need to install the required Python packages if you haven't done so already: > > pip install pandas matplotlib > > Here's a sample code snippet: > > import pandas as pd > import matplotlib.pyplot as plt > > Load the data from the CSV file > data = pd.read_csv('path/to/your/file.csv') > > Extract the Year and Temperature data > years = data['Year'] > temperatures = data['Temperature'] > > Create the plot > plt.figure(figsize=(12, 6)) > > Plot the data > plt.plot(years, temperatures, color='blue', linewidth=2) > > Customize the plot > plt.title('Historical Land and Ocean Temperatures') > plt.xlabel('Year') > plt.ylabel('Temperature (°C)') > plt.grid(True) > > To ensure the colors are accessible, we use 'blue' for the line > plt.gca().set_facecolor('white') > > Save or show the plot > plt.savefig('Historical_Temperatures.png') Save the plot as a PNG file > plt.show() Uncomment this line if you want to display the plot > > > Make sure to replace 'path/to/your/file.csv' with the actual path to your CSV file. > > This code will produce a simple line plot with the x-axis representing the years and the y-axis representing the temperatures. The line is colored blue for accessibility purposes. > > Feel free to adjust the code to suit your specific needs, and let me know if you have any further questions! > [!example]- Strategy 3, Solutions > **Solutions – Ask for multiple solutions** > **Alternative approaches** > Prompt-1: > I want to explore something with multiple solutions and problems. For example, I know there can be different functions in Python to calculate the mean of a list of numbers. First please generate a list of 20 random numbers. Then using those numbers, please show me 3 different Python functions for calculating the mean of that number list. I'd like to see the functions ranked from least to most efficient. > > ChatGPT: > Certainly! To begin, let's generate a list of 20 random numbers. We'll use Python's `random` library for this. > > import random > > Generate 20 random numbers between 1 and 100 > random_numbers = [random.randint(1, 100) for _ in range(20)] > > print("List of 20 random numbers:", random_numbers) > > Once you've run this code snippet, you should see a list of 20 random numbers. We'll use this list in the following examples. > > Next, let's move on to the three Python functions to calculate the mean of this list of numbers, ranked from least to most efficient. > > Least Efficient: Manual Iteration > Here, we manually loop through the list to find the sum and then divide by the number of elements. > > def manual_mean(numbers): > total = 0 > for number in numbers: > total += number > return total / len(numbers) > > print("Manual mean:", manual_mean(random_numbers)) > > Moderately Efficient: Using 'sum()' and 'len()' > This approach uses Python's built-in 'sum()' and 'len()' functions to achieve the same result but in a more Pythonic way. > > def builtin_mean(numbers): > return sum(numbers) / len(numbers) > > print("Built-in mean:", builtin_mean(random_numbers)) > > Most Efficient: Using NumPy > Using NumPy's `mean()` function is the most efficient way when dealing with large datasets, as NumPy is optimized for numerical calculations. > > First, install NumPy if you haven't already: > > pip install numpy > > Then, run this code: > > import numpy as np > > def numpy_mean(numbers): > return np.mean(numbers) > > print("NumPy mean:", numpy_mean(random_numbers)) > > Each method will give you the same result, but their efficiency varies. The NumPy approach is generally the fastest, especially for large datasets, followed by the built-in `sum()` and `len()` methods. The manual loop is typically the least efficient. > [!example]- Strategy 4, Explained > **Explained reason – Explanation of reasoning for solutions** > **Problem-Solutions-Rational** > > Prompt-1: > Please behave as though you are an expert data visualizer. I'm working with global temperature data. What types of visualizations would be appropriate for this type of data? > > ChatGPT: > Certainly, visualizing global temperature data can be a complex but rewarding task. The type of visualization you choose will depend on the specific questions you're trying to answer, the audience you're targeting, and the data you have available. Here are some types of visualizations that are commonly used for this kind of data: > > Time Series Charts > 1. Line Graphs: These are useful for showing temperature trends over time. You could plot global average temperature for each year to show global warming. > 2. Area Charts: Similar to line graphs but the area under the line is filled in, often used to represent volume (e.g., the number of unusually hot days each year). > > Geospatial Visualizations > 3. Heatmaps: These can show how temperature varies across different geographical locations. > 4. Choropleth Maps: These use different shades of a color to show temperature variations across different regions or countries. > 5. Globe Visualizations: 3D representations can be very impactful, especially for showing polar changes. > > Comparative Visualizations > 6. Bar Charts: Useful for comparing temperatures between different cities, countries, or years. > 7. Box Plots: These can show the distribution of temperatures for different time periods or locations. > 8. Radar Charts: Useful for comparing the temperature profiles of different locations throughout the year. > > Multivariate Visualizations > 9. Scatter Plots: These could be used to show the relationship between temperature and another variable, like CO2 levels. > 10. Bubble Charts: Similar to scatter plots, but the size of the bubble can represent a third variable. > 11. Parallel Coordinates: Useful for plotting multivariate, numerical data. > > Specialized Climate Visualizations > 12. Anomaly Maps: These show how temperatures are diverging from a long-term average. > 13. Streamgraphs: These can be used to show temperature changes and anomalies over time in a more fluid form than line graphs. > 14. Seasonal Decomposition Plots: These can break down historical temperature data into trend, seasonality, and noise. > > Interactive Visualizations > 15. Sliders: Allow viewers to change the time range. > 16. Dropdowns: Allow viewers to select different countries or regions. > 17. Zoom/Pan Options: Useful in maps and line graphs for detailed views. > > Storytelling Elements > 18. Annotations: To highlight significant events or anomalies. > 19. Infographics: To provide context or additional information. > 20. Dashboards: To combine multiple visualizations into a single view for comprehensive analysis. > > When creating these visualizations, keep in mind the principles of good data visualization: clarity, accuracy, efficiency, and aesthetics. Make sure to also provide appropriate context and scales, and consider your audience's level of expertise and interest in the subject matter. > > Prompt-2: > Awesome, this is very helpful. Now, I have a large data set of land-ocean temperatures starting in the year 1880 and ending at the year 2022. I have 3 columns in the dataset. These are 'year', 'no_smoothing', and 'lowess(5)'. Of the types of visualization you identified, which would be the most simple to produce and visually appealing? Please note, I have limited expertise using Python. > > ChatGPT: > Given your dataset and limited expertise in Python, a Time Series Line Graph would be the most straightforward and visually appealing way to represent your land-ocean temperature data. Line graphs are excellent for showing trends over time, and they are relatively easy to create using Python libraries like Matplotlib or Seaborn. > > You can plot both the 'no_smoothing' and 'lowess(5)' data on the same graph to show the raw data alongside a smoothed trend. The 'lowess(5)' column likely represents a smoothed version of the temperature data, which can help to identify long-term trends without the noise of year-to-year variation. > > Here's a simple example using Matplotlib to create a line graph: > > import matplotlib.pyplot as plt > import pandas as pd > > Assuming your data is in a CSV file called 'temperature_data.csv' > df = pd.read_csv('temperature_data.csv') > > Create the plot > plt.figure(figsize=(12, 6)) > > Plot the 'no_smoothing' data > plt.plot(df['year'], df['no_smoothing'], label='No Smoothing', color='blue') > > Plot the 'lowess(5)' data > plt.plot(df['year'], df['lowess(5)'], label='LOWESS (5)', color='red') > > Add titles and labels > plt.title('Global Land-Ocean Temperatures (1880-2022)') > plt.xlabel('Year') > plt.ylabel('Temperature (°C)') > > Add a legend > plt.legend() > > Show the plot > plt.show() > > This will produce a line graph with two lines: one for the 'no_smoothing' data and another for the 'lowess(5)' data. The x-axis will represent the years, and the y-axis will represent the temperatures. > > This is a simple but effective way to visualize your data, and it should be accessible given your limited expertise in Python. > > Prompt-3: > If I were an expert with Python, then what would you recommend instead? Can you give me an example? > > ChatGPT: > If you're an expert with Python, you have a lot more options for creating sophisticated and interactive visualizations. One powerful library for this is Plotly, which allows you to create interactive, web-based visualizations with ease. You could create an interactive line chart that allows users to zoom, pan, and hover over data points to see more information. You could also add sliders or dropdowns to switch between different smoothing methods or to focus on specific time periods. > > Here's an example using Plotly to create an interactive line chart with hover text and the ability to toggle between the 'no_smoothing' and 'lowess(5)' data: > > import plotly.express as px > import pandas as pd > > Assuming your data is in a CSV file called 'temperature_data.csv' > df = pd.read_csv('temperature_data.csv') > > Melt the DataFrame to make it suitable for Plotly > df_melted = df.melt(id_vars='year', value_vars=['no_smoothing', 'lowess(5)'], > var_name='Smoothing', value_name='Temperature') > > Create the interactive plot > fig = px.line(df_melted, x='year', y='Temperature', color='Smoothing', > title='Global Land-Ocean Temperatures (1880-2022)', > labels={'Temperature': 'Temperature (°C)', 'year': 'Year'}, > hover_data={'year': True, 'Temperature': True, 'Smoothing': True}) > > # Show the plot > fig.show() > > This will create an interactive line chart where you can hover over points to see their values, and you can toggle the visibility of each line ('no_smoothing' and 'lowess(5)') by clicking on their names in the legend. You can also zoom in and out to examine specific periods more closely. > > This approach provides a more engaging user experience and allows for deeper exploration of the data. It's also not much more complicated to set up than a basic Matplotlib plot, thanks to the high-level interface provided by Plotly. > > Prompt-4: > Can you explain to me why you chose these different approaches based on my stated level of expertise with Python? > > Chat-GPT: > Certainly! The choice of visualization approach is often influenced by a combination of factors, including the complexity of the data, the questions you're trying to answer, and your technical expertise. Here's why I recommended different approaches based on your level of expertise with Python: > > Limited Expertise in Python > 21. Simplicity: For those with limited coding experience, it's often best to start with something straightforward. A line graph is one of the simplest types of visualizations to understand and create. > 22. Library Familiarity: Matplotlib is a well-documented and widely-used library that's often recommended for beginners. It offers a relatively gentle learning curve. > 23. Quick Insights: A line graph can provide immediate, easy-to-understand insights into trends over time, which is what you're most likely interested in with global temperature data. > 24. Less Room for Error: Simpler code means fewer places where things can go wrong, which is an important consideration for those who are less comfortable with coding. > > Expertise in Python > 25. Interactivity: With more Python experience, you can leverage advanced libraries like Plotly to create interactive visualizations. Interactivity allows users to explore the data in a more engaging way. > 26. Data Complexity: Expert users can handle more complex data manipulations. For example, the Plotly example uses a "melted" DataFrame, a concept that might be confusing for beginners. > 27. Customization: Advanced libraries offer more options for customization, allowing you to create visualizations that are tailored to your specific needs and questions. > 28. User Experience: With more expertise, you can focus not just on displaying the data, but on optimizing the user experience, making the data more accessible and understandable. > > In summary, the choice of approach is a balance between the richness and complexity of the visualization and the ease with which it can be implemented. A simpler approach offers quick insights and is easier to create, while a more complex approach allows for deeper exploration and a more engaging user experience. ## 🎯 Results This workshop helped attendees by providing an initial framework for how to approach working with genAI. Students in particular reported leaving with a sense of how they may begin to apply genAI in their work. Content created for this workshop was used by myself and partnering faculty in other presentations and workshops. ## Resources - [GISS Surface Temperature Analysis (GISTEMP v4)](https://data.giss.nasa.gov/gistemp/) - Data file: https://data.giss.nasa.gov/gistemp/graphs/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.txt - Tyler, W. *[100+ ChatGPT prompts for creators: Speed up your workflow with AI](https://www.descript.com/blog/article/100-chatgpt-prompts-for-creators-speed-up-your-workflow-with-ai).* descript blog. March 29, 2023.