作者:whisper
链接:http://proprogrammar.com:443/article/704
声明:请尊重原作者的劳动,如需转载请注明出处
活动纪录表:
Activity
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | session_id | int | | activity_date | date | | activity_type | enum | +---------------+---------+ 该表是用户在社交网站的活动记录。 该表没有主键,可能包含重复数据。 activity_type 字段为以下四种值 ('open_session', 'end_session', 'scroll_down', 'send_message')。 每个 session_id 只属于一个用户。
请写SQL查询出截至 2019-07-27(包含2019-07-27),近 30天的每日活跃用户(当天只要有一条活跃记录,即为活跃用户),
查询结果示例如下:
Activity table: +---------+------------+---------------+---------------+ | user_id | session_id | activity_date | activity_type | +---------+------------+---------------+---------------+ | 1 | 1 | 2019-07-20 | open_session | | 1 | 1 | 2019-07-20 | scroll_down | | 1 | 1 | 2019-07-20 | end_session | | 2 | 4 | 2019-07-20 | open_session | | 2 | 4 | 2019-07-21 | send_message | | 2 | 4 | 2019-07-21 | end_session | | 3 | 2 | 2019-07-21 | open_session | | 3 | 2 | 2019-07-21 | send_message | | 3 | 2 | 2019-07-21 | end_session | | 4 | 3 | 2019-06-25 | open_session | | 4 | 3 | 2019-06-25 | end_session | +---------+------------+---------------+---------------+ Result table: +------------+--------------+ | day | active_users | +------------+--------------+ | 2019-07-20 | 2 | | 2019-07-21 | 2 | +------------+--------------+ 非活跃用户的记录不需要展示。
select activity_date day, count(distinct user_id) active_users
from Activity
where date_format(activity_date, '%Y-%m-%d') > '2019-06-27'
group by activity_date
近30天即2019-06-27之后,查每天有活动的用户数,学一下date_format,count+distinct
再看另一种解法
SELECT activity_date AS day, COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE datediff('2019-07-27',activity_date) < 30
GROUP BY activity_date;
大同小异,把date_format换成了datediff
Table:
Activity
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | session_id | int | | activity_date | date | | activity_type | enum | +---------------+---------+ 该表没有主键,它可能有重复的行。 activity_type列是一种类型的ENUM(“ open_session”,“ end_session”,“ scroll_down”,“ send_message”)。 该表显示了社交媒体网站的用户活动。 请注意,每个会话完全属于一个用户。
编写SQL查询以查找截至2019年7月27日(含)的30天内每个用户的平均会话数,四舍五入到小数点后两位。我们要为用户计算的会话是在该时间段内至少进行了一项活动的会话。
查询结果格式如下例所示:
Activity table: +---------+------------+---------------+---------------+ | user_id | session_id | activity_date | activity_type | +---------+------------+---------------+---------------+ | 1 | 1 | 2019-07-20 | open_session | | 1 | 1 | 2019-07-20 | scroll_down | | 1 | 1 | 2019-07-20 | end_session | | 2 | 4 | 2019-07-20 | open_session | | 2 | 4 | 2019-07-21 | send_message | | 2 | 4 | 2019-07-21 | end_session | | 3 | 2 | 2019-07-21 | open_session | | 3 | 2 | 2019-07-21 | send_message | | 3 | 2 | 2019-07-21 | end_session | | 3 | 5 | 2019-07-21 | open_session | | 3 | 5 | 2019-07-21 | scroll_down | | 3 | 5 | 2019-07-21 | end_session | | 4 | 3 | 2019-06-25 | open_session | | 4 | 3 | 2019-06-25 | end_session | +---------+------------+---------------+---------------+ Result table: +---------------------------+ | average_sessions_per_user | +---------------------------+ | 1.33 | +---------------------------+ User 1 和 2 在过去30天内各自进行了1次会话,而用户3进行了2次会话,因此平均值为(1 +1 + 2)/ 3 = 1.33。
select case when uc.scount is null then 0.00 else round(sum(uc.scount) / count(uc.user_id), 2) end average_sessions_per_user
from(
select user_id, count(distinct session_id) scount
from Activity
where date_format(activity_date, '%Y-%m-%d') > '2019-06-27'
group by user_id
) uc
使用select子句,比较好思考,但效率是个问题,注意当不存在满足条件的用户时返回0.00,学一下求平均值的round(sum/count)的用法
再看另一种解法
select
if(round(count(distinct session_id)/count(distinct user_id),2),round(count(distinct session_id)/count(distinct user_id),2),0.0)
as average_sessions_per_user
from activity
where datediff("2019-07-27",activity_date) < 30
虽然还有优化的空间,但更直接,用满足条件的所有不同session除以所有不同user_id,上面的解法还绕了一步,学一下datediff,round(count(distinct col1)/count(distinct col2))求平均值的解法
亲爱的读者:有时间可以点赞评论一下
全部评论